Ending the Debate On Serverless Cold Starts

Name: Ending the Debate On Serverless Cold Starts
Uploaded: 2020-08-11T17:30:00.000Z
Duration: 30 min
Description: Learn about how Cloudflare Workers is the first serverless platform to offer out of the box support for 0 millisecond cold start times, compared with cold starts on other platforms that can take seconds and add unpredictable variability.

Presented by: Jen Vaccaro, Ashcon Partovi

Originally aired on July 30, 2020 @ 3:00 PM - 3:30 PM EDT

Learn about how Cloudflare Workers is the first serverless platform to offer out of the box support for 0 millisecond cold start times, compared with cold starts on other platforms that can take seconds and add unpredictable variability.

English

Serverless

Cloudflare Workers

Transcript (Beta)

All right, hey Ashcon, excited to have you here on Cloudflare TV on the, what is it, fourth day of Serverless Week, so we're very excited. This is your Cloudflare TV debut, so I thought we could spend a couple seconds, you can introduce yourself and I can introduce myself as well. Sure, thanks Jen. I'm really excited to be here talking about Serverless Week. I'm Ashcon Partovi. I've been at Cloudflare for about a year on the workers product team and this week I've been mostly focused on performance in our Serverless Week mantra and so really excited to be talking about what we've been doing with cold starts and how they're no longer a problem. Great, very cool. I'll just quickly introduce myself. So I'm Jen Vaccaro. I am the product marketing manager for workers. I'm pretty new. I started about two months ago in the middle of shelter in place, but have been really enjoying it so far and excited to have Serverless Week so early on in my time here. So why don't you tell us a little bit, what have you been working on for Serverless Week? You just released a great blog. Hopefully most people have read it, but if you haven't, check out our blog on Zero Cold Starts. But why don't you tell us a little bit about what you've been working on behind the scenes here? Sure, so I'll give a little bit of context on Serverless Week in general first, just so we can understand how we got here and why it matters. So with Serverless Week, we really wanted to highlight all the great work that our team has been doing to really accelerate the workers platform to be kind of this prime Serverless compute that's on the edge, where you write one piece of code and it goes in over 200 data centers. It's really magical, and I think that, you know, working in the team, sometimes we forget that magic that's there, but when we talk to customers, it's totally game-changing. And so when we went through all different themes, from workers unbound, which was, you know, talked about on Monday, where we're releasing all the CPU limits, to new language support, to our kind of security aspects that we put into the platform, we really wanted to highlight performance and speed, because workers is lightning fast. And we've already known this, workers has supported five millisecond cold starts for the longest time, and we started thinking about how we could push the bounds of what's possible. You know, zero milliseconds seems kind of like this impossible task, you know, how could you race the speed of light and get to zero? But we set out a big challenge for our teams, and we were really excited to see them deliver on this. Yeah, yeah, that is really exciting, and considering what you're saying on how we came about Serverless Week, one of the big things, to go off your point, is there's a lot of myths out there on what sort of the traditional model of serverless looks like. And one of those things is, you know, in the industry, and our customers have had, you know, issues with, or on other platforms, is that cold start, and that unpredictable delay that can happen at our end users. So that's something that's, I think, really cool that we've been working on. And so like you touched on, I think you said, so in the past, we've had under five milliseconds of cold starts, which was already really fast, especially compared to like containers or some of our other technology. So now being able to bring that down to zero seconds, which is essentially, which is no, you know, no cold start at all, is pretty game changing, and game changing for the industry, and how people can think of serverless, and those barriers there that we're slowly unlocking. So I think that's really cool. Why don't we touch on, for anyone out there who maybe doesn't know what cold starts are, maybe they've heard the term. Why don't we go over, do you want to share a quick overview on what cold starts are, and why they cause such problems for our customers? Yeah, absolutely. The cold start problem is, it's a sneaky little problem, but it's actually a lot bigger than you might think. So in most serverless platforms, in order to handle requests kind of on the fly, providers have to start up the runtime that has all your code. Most platforms use containers, and containers have a lot of layers. There's a lot of processing that takes place to load it up from disk, to put it in memory, to get all of the resources ready to serve your request. And as we mentioned earlier, Workers has historically done all this in just under five milliseconds, which is quite amazing when you compare it to essentially almost any other serverless provider that might be using containers, that is measuring in hundreds of milliseconds, sometimes even full seconds and beyond, which is a really bad customer experience. You know, you're deploying a production service on a serverless provider, and your customers are seemingly sporadically getting two second delays, five second delays, it's a really bad experience. So why do these delays happen in the first place? It's not just that it takes a long time to load up a container. It's because serverless providers have to essentially prepare in advance when that request comes in. And it's not economical to run everyone's code all of the time in memory. So what many providers do is just very simply, once the first request comes into your function, it'll take the time to create the runtime, set up the code, the resources, and then serve the request. And then subsequent requests won't have that cold start delay. Now, the issue is that if you have a service that maybe gets requests once in a while, or you have bursty traffic, you're just not going to know when there's going to be a cold start, it's going to be totally unexpected. And there's not a lot you can do to stop it. And so other providers, what they've tried to hack around this issue, is that they recommend customers, sometimes even charge them to create synthetic requests, just to keep your function warm. So just to step back for a second, because on other platforms, there's this cold start problem, where every once in a while, more often than usual, your function is going to take a long time, you have to pay to keep it in memory and to keep it hot to go around the problem. And we think that's a really bad solution for customers. And so when we were looking out for a solution, we wanted a technology that is going to be able to do that without the customer having any pain, without the function having to wait, and completely out of the box. So the problem is solved for everyone. Yeah, and I think just to reiterate some of your points there, with cold starts, a big thing is that you mentioned at the beginning is the unpredictability, that it doesn't mean like maybe in one instance, it can come on in, you know, five milliseconds, another time, it can be a slower and having that unpredictable interface. And then to so you either have that, or you can have this sort of alternative of like you mentioned, these different hacks and these different workarounds, where that time, you know, to keep the to pre warm or whatnot, to get up the, you know, to bypass the cold start is actually charging our customer at this, you know, using up that CPU time limit, and it's going into the customer's bill. So it's kind of this sort of catch 22 that a lot of is offered is like, either you kind of suffer from that, you know, predictability, or you're you can be caught up paying for this extra, you know, CPU time or whatnot that is put on at the at the alternative serverless provider. So I think that's a really interesting thing. And it's a pain point for a lot of developers out there who, you know, don't really want to have to take one of those trade offs. And not only is it a big pain point, but as you kind of mentioned earlier about it slipping into your bill, for customers that may not know, it is a real hidden feed that will rack up on your bill. And that's not something that we think customers should have to deal with. Right? Yeah. So why don't we take a second, I know, a lot of people have had questions for us on what really are v8 isolates that you know, we run our we want run workers on and how does that compare to containers, which like you mentioned, can take can be a longer process to spin up and end up having some delay for the customer. So can we do a quick overview on v8 isolates versus containers? Yeah, absolutely. So as mentioned earlier, most serverless providers, you'll spin up a Docker container, just any type of container. And that is what is, you know, kind of commonly used. When we decided to make the architectural decisions for workers, we looked at containers. And we thought, well, they're good for some things. If you have a centralized service, if it maybe, you know, doesn't run too frequently, you know, maybe maybe that's a good solution. But if we want to run code really fast, distributed on the edge in 200 data centers, right, the cold start just doesn't scale. So we picked v8 isolates. Now, v8 is the kind of JavaScript engine that is powering a lot of browsers, particularly Chrome. So if you use the Chrome browser, it's powered by v8. And so we took that technology that can load up, you know, websites, when you open up a new tab, really fast, it's secure. And it's sandboxed. And so we took that idea of that sandbox. And we said, Well, what if we applied that to server side code, and we distribute it all over our data centers. So that that's the fundamental architectural difference between containers, which have a lot of layers, takes a lot of time to start up versus isolates, which exactly like your browser are really fast to load, you know, you can have 10s and dozens of tabs on your browser. And so we apply that exact same principle to the cloud. Yeah, and that's pretty interesting as well. And something to add there is that that automatic scalability that the v8 isolates and that workers allows for our customers and not having to go in and manually configure that scalability can be can be pretty huge. And I think that's something a lot of developers who've worked with workers have found to be particularly useful. If you've ever used containers, deployed containers, you've always had to deal with problems like, okay, how many replicas do I have? What's the CPU limit at which I want to auto scale? Those are really tedious manual things that you have to tune. And it's just a really bad experience. It's kind of arbitrary to just put a number of replicas. What would be really magical is if you didn't have to do any of that scaling at all. And right, because we have isolates, we're able to scale concurrently to 1000s, 10s of 1000s, hundreds of 1000s of requests per second. And that and yeah, that is something very exciting. And for anyone who's been tuned in or not to serverless week, as you touched on in the beginning, we've had a very interesting article coming out on our blog on the security of isolates as well. And that is something that I know some people had, you know, had questions about on isolates and running in a shared runtime and whatnot. And if there would pose any, you know, security issues there, and the answer is really no. And what's come out and that content relayed in the blog is that it's, you know, obviously coming from Cloudflare security is in our DNA. But being able to have like the spectrum mitigation and V8 patching has is something that is pretty exciting with as a part of V8 and making sure that that is, you know, not only high performant, and all of those things, but also secure to its core, which which I think is pretty interesting. And I know some people had had questions on around isolates in that way. And it's, it's exciting to see what what the secure security we have there as well. Yeah, we spent we spent a lot of time thinking about security. Engineer spend a lot of time thinking about that. As was talked about in the blog post, you know, we worked with some of the leading vector researchers in order to look at our platform and see, okay, you know, how does Cloudflare Workers, how are they going to compare in a situation like, like specter. And we did a lot of great proactive work in order to make sure that our runtime has the capabilities to kind of detect possible attacks and isolate them dynamically. Yeah, yeah, it is. It's something pretty cool. So if anyone listening hasn't read Kenton's blog yet either on how we're what we're doing for specter mitigation and on V8 auto patching, definitely check that out. I do want to bring this back down to cold starts. And so I know you started kind of talking on what's going on behind behind the hood. But I thought we could spend a second and explaining the TSL handshake a little bit in that pre warming if we could just touch on that. Yeah. So when we pose this challenge to our team to race to zero cold starts, we had a lot of really interesting ideas. The great thing about running an intelligent network like Cloudflare is we get insights from all sorts of different parts of the Internet from DNS to as you mentioned, TLS and security with encryption. And so we thought, well, most worker requests are handled over HTTPS. And in fact, over 95%. And so we thought, well, what if we take the encryption protocol around HTTPS, TLS, and see if we can add in some inherent performance improvements using that protocol. So for people that don't know about what TLS is SSL, all these kind of acronyms that are going around. Essentially, whenever you want to send a secure request, HTTP request from your browser, it's going to send a packet, which is a hello, a client hello to the server. And in that packet, there's a whole other host of kind of encryption details, but we don't need to get into the specifics of those details. What's important is that in that first packet, there's the host name of where the request is going. And previously, what we would do is this encryption process is also known as a handshake, because the client sends something to the server, the server sends it back to the client, and the client verifies. And it takes time for the server and client to quickly handshake and negotiate before the actual request gets sent. And previously, that's just dead time that we're waiting. And so we thought, well, in that first packet, we have the host name, so we can guess where it's going. And so what we decided to do is that when Cloudflare receives that first packet, we hint, we send a request to the worker's runtime. And we say, hey, this worker at Cloudflare, let's say Cloudflare.com, is likely going to be executing very soon. You should load up early so that when it arrives, you're hot and ready to go. And it turns out that when we use that hinting strategy of when we receive the first packet and hint to the runtime, we have enough time in order to load up the code, load the resources. And so when the actual request comes in, it's ready to go, and we don't wait to process it. That's pretty exciting. And like you said, in the past, we would warm up upon the first request, right? And so now what we're doing is pre-warming it, as you're saying, so that by the time the request comes, it's already running, and we're not having to pay that sort of performance or speed penalty, which is pretty interesting. And this is also where the advantage of isolates comes in. Because you can't do this type of optimization on a container. Because with isolates, it takes, as we said, under five milliseconds to load up. If you tried to do this with a container, and it took several seconds, there wouldn't be much of an improvement, maybe a little bit, you'd still have the cold start problem. Right. And so because two things, because we have so many data centers, we have 200 data centers, and the latency between any of our data centers and a user is really, really small, paired that with that five millisecond cold start time, we're able to pair those two things together and achieve zero millisecond cold starts. Yeah. And that's very exciting. So why don't we tell people, so when is this available? Is it available today? Is it, what can people expect? So I think people should start expecting that we're going to continue to roll out really awesome performance improvements like this on the fly. And so we actually deployed this last week to everyone. So you get this performance improvement out of the box. There's no extra fee. We really believe that when you build on workers, you're buying in to a really performant and intelligent platform. And we continue to improve that day in and day out. So we're really excited that this feature is completely for everyone. It is totally free. Yeah. Great. So to reiterate, so it's available even to our free workers users and to other plans. Is that right? Yeah. So if you use free workers, the improvement will also be there as well. Okay. And I know that there are some limitations based on the host name and whatnot that comes with these zero second cold starts. Can you share any sort of limitations or caveats people should be aware of? Yeah, there's one small caveat right now, which is that, so as mentioned that hello client, the first packet that gets sent that has the host name, it only has the host name. So Cloudflare.com. It doesn't have the path. So if I was going to the homepage of Cloudflare, it wouldn't say Cloudflare.com slash home. And so if I have multiple workers on Cloudflare.com right now, we don't know which one to warm up. And so in that case, we won't warm it up. So what's best is if you really want to optimize for our zero millisecond cold starts, it'll work on any zone that has just one worker on the whole zone. Now we're looking to add future improvements. So maybe we can guess the path based on traffic and various other factors. But for now, we really wanted to roll out a V1 so we can really push out the performance and for customers to see. Yeah, that's very exciting. So what are some other things you mentioned at the beginning that part of your time here has been around performance? So what are some other things we're doing in that space? Yeah, so one of the really powerful things about workers is our ability to not only run code with zero millisecond cold starts, but actually running the code faster than a lot of our other competitors. When we released or when we announced workers on bound earlier this week, we also showed some performance tests that we did with a very basic GraphQL server. So imagine just a very basic worker that sends back hello world. We'd expect that that would be consistent. The performance would be consistent across many other serverless providers. In fact, we found that not to be the case. Workers was able to process a hello world in less than one millisecond, whereas other competitors were trailing in the five milliseconds, 10 milliseconds. We even had sometimes requests going up to 30 seconds on some other providers that use containers. And that's really the key difference to highlight that key difference between containers and isolates. With isolates, your code is screaming fast. With containers, it's a mixed bag. And so we really wanted to show that performance comparison in a palpable way. And we put all our performance tests, our raw data, our methodology of how we did it on GitHub. So anyone is willing and able to check our numbers. We're really confident in the speed of our platform. And so we really wanted to show that off. Yeah. And what was striking, you know, when I was also looking through some of those numbers is how, you know, even at the beginning, you mentioned under 0.9 milliseconds in comparison. And I think at like the 50 percent volume or whatnot, it was pretty similar across some of the other competitors. But as the volume started to increase, it was interesting to see just how significant the gap becomes where workers is running, yeah, very slow in the milliseconds. And some other competing options are running just significantly faster. And that continues to increase as the volume starts to increase. And you really see a noticeable difference, which I think was pretty striking in that data. Yeah. When you compare, so let's imagine that you have a really high workload serverless function, hundreds of thousands of requests per second. On another provider, what you'll have to do is you'll have to set a concurrency setting. So how many functions are you willing to have loaded or to process at a certain time? With workers, there is no such setting because we essentially give you infinite concurrency. When our servers start seeing a lot of requests for a specific function of yours, we'll just move it to another server and process it there. And because we have so many data centers and so many servers, we're able to handle that volume. So as you mentioned, some of those other providers have really high edge cases where you might have one request that takes 30 seconds. We actually have an example of that in the data. And it's just a horrible, I mean, if you can imagine as a customer waiting 30 seconds for a hello world to show up. It's pretty significant. Yeah. That's the real performance power of using isolates on workers is that you get not only great performance with zero cold starts, but it's consistent. Consistent and you're not paying a penalty on fees or any hidden costs that might be coming from cold starts and in other ways. Exactly. One of the other points we made out is when you sign up on another provider, it really does act as a walled garden. They make you pay for almost every little nit bit. So you might pay for DNS requests. You might pay even for having HTPS. There are some providers that charge you for using encryption on the edge and it just racks up in your bill. And when we were testing this, I was just going to say, when we were testing this, we're totally surprised when we looked at our bill just for a simple function to see requests for the number of logs, they charge you for how many log entries you put in and on workers, it's very straightforward. We just charge you for compute and in number of requests. And then of course, with Unbound, we introduced variable CPU, so you can essentially have unlimited limits. Right, right. Yeah. So maybe what we can do is since it is serverless week, we can give a quick recap of what are some of the other things like you mentioned Unbound and then give a little teaser coming forward. So as you mentioned on Monday, we released Workers Unbound. So that was really extending our CPU time limits as a beta. So we have that beta signup available. So if anyone is interested in that, you can type in Cloudflare Workers Unbound beta, it should come up right away. We've gotten a lot of positive feedback. Ashkan, I don't know if you've been reading some of it, but Hacker News, Forbes, TechCrunch, a whole bunch of places have been talking about it. And we've had just tons of signups. When I last looked at our signup list, it was nearing 700 of interested signups for our beta, which of course, we'll have to parse through and come up with a reasonable number to get started with. But just that new, those new workloads, those compute intensive workloads that are now opened up because of that, I think will be really exciting to see. And then also, of course, now with the zero cold starts, it'll be interesting to see how those play out for our customers. Yeah, I think for our customers, one of the great takeaways from all of Serverless Week is that when it comes to workers, we're just getting started. And so we have a lot of exciting features in the roadmap. And we think things like Workers Unbound, not only is it going to increase limits for our customers, for you, it's going to increase the limits for us, which allows us to build even more powerful features and capabilities in the platform that we are really excited to give to you guys. Yeah, that's great. And so coming up in the rest of the week tomorrow is our last day of Serverless Week. Definitely everyone should stay tuned on some of what we're releasing around there in improving our developer experience. And getting that just to have a whole, you know, from performance to CPU time limits to developer experience and security, we have a whole bunch of exciting things. So this was really great, Ashkan. I hope anyone tuning in had learned a lot. Absolutely. Thank you so much, Jen. And thank you all for tuning in.