Waiting Room: Random Queueing and Customer Web/Mobile Apps
Presented by: Fabienne Semeria, Matthew Jacob, George Thomas, Tyler Caslin, Aditi Paul
Originally aired on January 21 @ 3:00 AM - 3:30 AM EST
Tune in to learn about Cloudflare's Waiting Room product!
Waiting Room allows organisations to seamlessly handle large influxes of customer traffic, ensuring their origin servers don't get overwhelmed.
English
Transcript (Beta)
Hi, everyone. Thank you for joining us today on Cloudflare TV. We are the waiting room team.
And we are here today to talk about what we've been up to recently. So, yeah, first, let's go through introductions.
I'm Fabienne, I'm the engineering manager for the waiting room team.
I've been an engineer for 11 years now, and I've been with Cloudflare since 2019.
And, yeah, I've been an engineering manager for one month now for this brand new team.
During my time at Cloudflare, I've worked on two products.
First is health checks and second is waiting room, which we are going to talk about today.
And I'd like to introduce George who has worked with me on both these products.
Yeah, hi. Hi, everyone. My name is George. I'm like, yeah, I worked with Fabienne initially on the health checks and then waiting room now.
And I joined three years back. So being here like a little more than Fabienne but not too much.
So around the same time. So and both of us did work on LB2 before that for a little bit.
Yeah. And then, yeah, I've been working on waiting room for the past one and a half to two years, I would say.
So and I'm based out of San Francisco.
That's great. And we also have Matthew who joined us to when we started working on the waiting room.
Hi, yeah, I'm Matthew. I'm an engineer.
I was on waiting room from May of last year, up until October of this year.
So got to do a ton with this great team. So to the beer. Thanks, Matthew. And our newest recruit, Aditi.
Hi, everybody. So I worked on LB as an intern last summer, and then I returned in July and I started working on this great team.
And yeah, so I'm a part of waiting room as of now.
And I'm based out of San Francisco. Yeah, so yeah, LB stands for load balancing.
And it's a team we all used to be on before splitting into our own team for waiting room.
And last but not least, our guest, Tyler.
Hey, I'm Tyler. Today, I'm a guest. It's a bit strange. But I was an intern over the last two summers and worked on waiting room this summer.
And the summer before I worked on CDNJS.
I'm from Massachusetts, but I'm currently studying in the UK and finishing up my last year.
I'm doing a master's. That's great.
Thank you, everyone. So yeah, that's that is us. Now on to what we work on.
Matthew, do you want to give us an introduction of what waiting room is? Yeah.
So waiting room is basically in a service that allows us to manage the number of users on a website.
So if you have a lot of users flooding to your website, we put them into a waiting room and gradually allow them onto your website so that your website can handle all the traffic and it doesn't go down.
Like we see so often with ticket sales, PS5 sales, vaccine signups.
And so for vaccine signups specifically, we debuted the product at the beginning of this year for free to a variety of different organizations, governments, etc.
To use waiting room to protect their vaccine signups.
So people could sign up for vaccines without worrying about the website crashing, or having to go through these complicated processes to get the system set up.
Some examples of people we got to work with were Luma Health in Cook County.
That's the county, including the city of Chicago in Illinois.
So a lot of people who I know, growing up were actually able to use Cloudflare's waiting room to sign up for a vaccine.
The German state of pardon my pronunciation, Thuringia, San Luis Obispo, a lot of places around Japan through class method.
And the countries of the country of Macedonia as well.
That's right. Yeah. So this is if you are work for an organization that's under scoping vaccines, you can look up Cloudflare project per shot.
And maybe, yeah, maybe you're eligible to get our product for free.
George, do you want to tell us a bit about how it all works? Yeah, definitely. So before that, like, let me share my screen and show you where it exists in the dashboard.
So Matthew gave like an initial overview of what waiting room is and like how it helps to control the amount of traffic that goes to the website.
But I'll go through where it is configured in the dashboard first so you can actually visualize it.
So is my screen visible?
Yes, I see it. Okay, cool. So it's found in the traffic tab. When you go to your zone settings.
So it's not a network, it's an app. So and then there is a waiting room tab.
So in this case, I have already created the waiting room for convenience.
So you can try editing it. So what are the things that you can set for a waiting room?
So waiting room exits in on a path. So you so right now, this is the hostname, ssl.wr.ctml, where the waiting room host is.
So whenever someone tries to access the URL, the Cloudflare will route the traffic to the waiting room for this hostname.
And we have set the path to be slash room. So all the traffic that goes for ssl.wr.ctml demo slash room goes to the waiting room.
So and what what happens when it happens, so you can set a bunch of settings like total active users, users per minute and session duration.
These are the number of people that you you think you can support in this path.
So and the session duration is like how much time you expect one user to spend on the website.
So these are like things that you can tune to like customize your meeting.
And after that, you can select what queuing method we select.
And there's people which is like the traditional way and random which time we get back to that.
Yeah, speak about in more detail. So you can select all those and then like there's an X screen just to confirm and you can actually preview the waiting room.
Is the preview visible? Yes. Okay, so you can see the preview here.
So from the dashboard, you can see what the people how it will look and when someone is queuing.
And this is not the fully pledged advanced version, the advanced version, you can actually edit the actual HTML that can be with that is shown to the users.
So you can customize it further. So there's also another option called qmall, which is on right now.
So as qmall is on, like right now, if I go to SSL, r.ctm demo, you will see the waiting.
This is like live right now.
But there's no waiting room if you go to the root part because it bypasses the waiting So that's my quick overview of waiting room.
So how this works is like there are workers, it's we wrote waiting room on workers, which is like Cloudflare's own product.
So we have the habit of putting our own products. So whenever like you, you look for when you hit a URL, your request is going to a worker that Cloudflare is running.
And we have the logic of logic of the waiting room running over there.
And we do check what the condition is, if you should be queued based on these limits, and decide to send you either to the origin of the waiting.
So the advantage of this approach is that like we don't add too much latency to your request, as there's no central location where all the requests go.
And you can imagine there's like multiple doors, which open to like a stadium or something of that sort.
So that's my like quick overview. Thank you for the overview, George.
Yeah, you can stop sharing your screen if you want. Yeah, thanks, George.
So yeah, that was thanks for the overview. Now, Tyler, can you tell us a bit more about what you did during your internship?
Yeah, of course. I worked on a couple different features over over the summer.
And I wrote a blog post on on some of the ones that got shipped out.
And then there was another one called event scheduling that we'll get to as well.
And so the first one is called JSON response. And so basically, when when you configure a waiting room, like George said, if you're part of the advanced plan, then you can provide a static HTML template, which will customize the style of the waiting room page.
And so this is great for most purposes, you know, most people just don't need anything too fancy.
But if you want to display something dynamic, like, for example, if you want to include a video on the page, it won't be enough, just with the static template.
Because for example, like the video will load and you know, a user can click on it, but the page will refresh in 20 seconds or so and it will start the video back to the start.
And so if you want anything dynamic, we created this JSON response.
And very simply, if the waiting room sees a particular HTTP header, I think it's accept applications less slash JSON, it will send back a JSON object with all of the fields relating to this user in the waiting room.
So for example, it will include the wait time, the type of queuing method, how long until the page should be refreshed again, and so on.
And these fields will help you power a custom application, which can be a mobile app or like another web app.
And it just gives control basically to the to the interface of like how the waiting room can look.
And so you know, you can put ads in there, you can put videos, you can make it however you you'd like it to be.
Yeah. So that's been a useful feature for our customers.
Another one you've worked on has been introducing a new type of queuing methods.
So can you tell us a bit more about random queuing and how it's different from what we did before?
Right, so initially, we used a kind of a form of FIFO queuing and FIFO just stands for first in first out.
So the first user who gets to the waiting room should be the first user who gets to the origin website eventually.
But we actually group these users by by the minute that they arrive in, like the truncated minute.
So if you arrive at like 505pm and 30 seconds, it will be rounded down to like 505pm.
And it's kind of complicated, like please read the blog post.
But basically, for FIFO, like, we will when when we see that there's availability on the origin website, like when we see that we can let more people into the origin, we let the people in with the earliest arrival timestamps, truncated timestamps called bucket IDs.
So basically, yeah, we are trying to let the people who tried first, go through first.
Yes, the people who arrived first should be let in before the people who arrived later, and so on.
And so random queuing, the idea behind random queuing is that people will be let into the origin randomly.
So it doesn't matter what time you arrive, you'll, you could be let in before someone who arrived before you or after you or it's just, yeah, people are let in randomly.
And we have customers who specifically asked for this feature.
Because they didn't really want people to, they wanted everyone to get a chance, right?
Yeah, I think part of it is kind of like, it creates a lot of hype, because you don't know when you'll get in.
I mean, maybe people don't like waiting, but it could be exciting.
And also, for some like limited time sales or special events, some some websites wanted to make it fair to people on different timezones who may, or make it fair, in case, you know, like a bunch of people had, like got up in the middle of the night and just basically taken all the spots in the waiting room.
And so if you go and try to buy something, you basically have no chance to do it, or maybe it will take 10 hours.
And so it's just a different form of fairness that you can use.
George, I think you wanted to ask something about random queuing.
Oh, yeah. I mean, it's just like, just like, you know, what were the challenges you faced, like while designing, coming up with the design of random queuing?
Because, because we had, we already had FIFO.
And like, how did you come up with the design? Because I know you, you put in a lot of work into it.
Yeah, so so we built it off of the infrastructure that powers FIFO.
And so we technically do keep track of when you arrive, in case we switch back to FIFO from random.
So that, like, if you join FIFO, and you're in like the fifth spot, let's say, and then we go to random and people are let in randomly, and then we go back to FIFO, you'll still have the place that you started with, I guess.
So that's one cool thing about it. Initially, when we were designing it, we were thinking, like users could just like, like to do the, to implement the random part of it, users could like flip a virtual coin, like every, every time they try to get in.
And if it's like, like, basically, we'd calculate a number between zero and one.
And if it's less than the number of slots divided by the number of people waiting, yes, then you then you'd be able to get let in.
So like, there's 100 slots and 30 people. Wait, no, if there's, there's 100 people waiting in 30 slots, we divide 30 over 100.
And you have a 30% chance of getting in next time you try to check in to the waiting room.
But then we realized, like, we could basically just give the slots to the first person who checks in to the waiting room, kind of like a big race condition, which sounds really weird.
But basically, we ensure that people can only attempt to get in every so often, like 20 seconds.
And so even though people are, even if you like refresh the page a bunch of times, it won't make a difference.
You can only attempt to get into the website every so often.
And so it technically is fair. It's just kind of a strange way to put it.
And we do a couple of different things to try to offset people who are checking in frequently so that there can be a nice distribution of user check ins to try to steal this next spot.
But yes, please read the blog post that explains it much better than what I'm mentioning.
Yeah, speaking of distribution, you went to a lot of trouble to check what that distribution was, didn't you?
Yeah. Yeah, there was a lot of thought put into trying to make sure that it actually is random.
And trying to make sure that the estimated times are correct, because, or somewhat accurate, because what if you're in this environment where people are selected, like, basically, like picking names out of a hat, you you could technically, you know, not be selected a bunch of times in a row and get some extreme wait time, which is very unlikely, of course.
But it could happen. And so like, how do you how do you show an estimated wait time for that?
And so we, instead, we display like probabilistic wait time.
So like 50% of people will experience a wait time of 15 minutes or less, for example.
And so we can display ranges, probabilistic ranges of wait time based on the rate that we see people being led into the application and based on the history of the previous few minutes, basically.
Yeah. Yeah. So yeah, we already thought you would be busy on summer with this random queuing thing on top of the JSON format that you studied, but you managed to squeeze in a third picture.
Yes, I also worked on event scheduling.
It's not part of my blog post since it was only recently shipped.
But basically, it allows you to configure events for a waiting room.
And so an event is basically a specified period of time that where you can change the behavior of your waiting room.
So for example, you can configure an event from like 8 to 9pm tomorrow, that changes the template of your waiting room, or maybe it changes the queuing method or the limits like George mentions, like, like in the dashboard, the total active users on the origin.
And so these events can be really powerful.
And they're great for you know, if you have like a limited time event or sale or, and there's very interesting use cases you can make with them, especially with the new queuing methods we added called pass through and random.
And pass there basically, lets everyone into the origin, they just go right through the web right through the waiting room and right into the origin right away and reject blocks everyone at the waiting room.
So they never get to the origin.
And they're very strange, like you can read about their documentation, but they're extremely powerful.
Like, basically, like for reject, you can set up like a maintenance page basically, as your waiting room, which is a bit strange, but or you can have a waiting room that's events only.
Or yes, to have like a site that is only active during your predetermined events.
Yeah. So yeah, you're you are very, very busy.
So yeah, actually, yeah.
Like we said at the beginning, you've been an intern at Cloudflare twice now.
And you're coming back full time next year. And we're very excited about that.
Do you want to tell us a little bit about what you keep coming back to Cloudflare?
Yeah, sure. So the reason why I applied to Cloudflare in the first place is a bit of a fun story if you want to hear it.
So basically, like it was the first day of my second semester at school, in my third year, and I was really excited to go to this one networking class, because I knew that my favorite lecturer would be teaching it.
I won't mention him by name. But basically, when I showed up to this class, I learned that he wasn't teaching it and someone else was there.
And I was just like, kind of devastated. I'm like, Oh, is he just filling in for today?
But no, he was, I learned that my teacher had had left for a few years to work for Cloudflare, basically.
And so Cloudflare kind of got in my head.
And after that, I was like, I was just seeing it everywhere. And I saw a blog post, blog post, saying that they were doubling their intern class.
And I said, Oh, what the heck, like, I'll apply, maybe maybe we'll work out.
And, and I'm so happy that I got in as a result.
And so, like, in my experience, Cloudflare is like, just so warm.
And it's an amazing, like, tight knit place and full of talented people.
And as an intern, I feel like you're treated just like any other member of the team, minus the fun on call stuff.
You're also given lots of freedom on the projects that you work on.
Like I got to design my own projects is really fun.
And I feel like everyone was very interested in, like my development. And I've just grown so much at Cloudflare.
And especially thanks to all of you guys. And also, it's really cool.
You can as an intern, you can talk to the executives and ask them questions.
And like I said, it's extremely tight knit. So it's really cool. Yeah, I did say you've also been an intern at Cloudflare, you came back.
Yeah. So I was an intern last summer.
And it's been crazy because as an intern, it's surprising how much impact you can make.
And you have the freedom to, as I said, I wrote my own functional spec design my own feature.
So I think that's really cool as an intern.
And then I got to work on multiple services across teams. It was it was, it was really challenging and interesting at the same time.
And yeah, I felt like we could, like I like, we, interns are treated equally, and they are, they can make a great impact.
So that's the best part. And also, everyone here is so helpful.
And like, I kept bugging people with the smallest of doubts, but like, everyone was so approachable.
And I think that's the best part. It's tight knit, and people are talented.
And you you have your own freedom to do like, whatever creative way you want to design your feature, whatever.
So I think that's the best part.
And it continues even as a full time engineer, I've already worked on two products and multiple touch multiple repos, work across teams, and worked on Go, JavaScript, Salt, I don't know, I think I keep learning every day here.
And that's the best part.
So I hope to do so. That's great. I'm very happy both of you are having good experience or have had them.
Aditi, do you want to tell us a little bit about your next project, what you're working on right now?
Yeah, so currently, I'm working on analytics for waiting room.
And basically, the analytics project, it would enable the customers to witness or have a real time representation of their traffic.
And it would give customers insight about how their traffic is behaving in their waiting room.
So how this is helpful is that customers can fine tune their waiting room configurations, users will be able to see different, different graphs that show values of, say, total active users, new users that join per minute and the estimated times.
Also, I'm working on creating a representation, basically representing quantiles of the time users spend on the origin and the refresh interval times and the estimated waiting time.
So all of this would help customers understand their traffic and understand the trends and fine tune their waiting room configurations.
Also, we will expose GraphQL endpoints so that customers are able to query this data and get more information.
Thank you, Aditi. Yeah, that's going to be a very valuable addition, I think, to the product.
So yeah, I think it's a good time to say that the team is hiring.
So if anyone in the audience, you know, is interested, and you've all heard the kind of thing we are working on.
So is that something that appeals to you? Please, please apply. One of the reasons we are hiring is that Matthew sadly has moved on to another team.
So we need to replace him. Matthew, do you want to tell us a little bit about what you're doing right now?
Yeah, that'd be awesome. So I recently moved to the distributed web team.
The distributed web team is taking over two products from the research team, our Cloudflare IPFS gateway and our Ethereum gateway.
So I will be working on making our IPFS gateway available from all of our edge colos as opposed to our core data centers.
So we can reduce the bandwidth across our network and reduce latency for IPFS.
And help the decentralized ecosystem as well. That's great.
So yeah, that's one thing, cool thing at Cloudflare is that you can, you know, find new projects when whenever you feel like a challenge.
There are always new opportunities, right?
Oh, yeah. Always cool stuff being developed, being worked on new features, new products, et cetera.
Just to add one thing, we should also give a call to the Durables objects team, which will help us have like a scalable team, because we run on Durables objects.
Yeah. Can you tell us a bit more?
What is Durables objects for people who don't know?
Yes, so Durables objects is like an object store that Cloudflare has. And waiting room stores the queue state in Durables objects.
So yeah, it's, it's like a product that's trying, it gives like sharding for free, essentially in a database.
So you could, it's, you could say that it, it's like a pretty good project to build waiting room on because it's scalable because of that.
So it integrates with workers.
Yeah, natively. So that's also one of the reasons we used it.
Yeah. Like George said earlier, we built our own, like most of our product on top of workers.
So this has been a great experience. And we are very grateful to the workers team for all their support.
I think they're also hiring. So that's something you'll be interested in, apply to them.
I see we have only a few seconds left.
So thank you everyone for taking, like being with us now.
And thanks for everyone on the team for sharing their experience. Thanks Fabienne.