This Week In Net

Presented by: John Graham-Cumming

Originally aired on October 13, 2021 @ 10:00 PM - 10:30 PM EDT

A weekly review of stories affecting the Internet, brought to you by Cloudflare's CTO. We'll look at outages, trends, and new technologies — with special guests to help us explore these topics in greater depth.

Original Airdate: July 31, 2020

English

News

Interviews

Transcript (Beta)

All right, let's do it. Welcome to This Week in Net, which is my weekly show about stuff that's happened on the network or on Cloudflare over the last week or so. And this is quite a big week because this week we've been running something called Serverless Week, which has been all about improvements and changes to Cloudflare's workers product, which is the product which allows you to write code pretty much in any language you feel like and execute it around the world on Cloudflare's servers. It's sometimes called serverless, sometimes called functions. There's lots of different names for it, but the big idea is that you can just run code without worrying about scaling or where it executes. So Serverless Week has had a number of announcements, four so far, Monday, Tuesday, Wednesday, Thursday, and there'll be another announcement today, Friday, which you'll have to wait a couple more hours before, and I'm not going to preview what it is, but it's really all about making things better for developers. But I'll show you, you'll see later about 2 p.m. London time, 1400, so at about 6 a .m. on the West Coast in the U.S., what that announcement is all about. But I thought I'd take you back through this week and talk about the things we announced. And we actually didn't just announce things for Cloudflare Workers, for the serverless portion, but also for our general service. So I'm going to go through, what we're going to do is we're going to essentially read the Cloudflare blog together, and we'll take a look at what's been announced. And the first announcement, or the last announcement in a sense, but the first one I'm going to look at, is bringing your own IPs to Cloudflare. And this is an interesting product that is now being launched fully for everybody. For many years, if you sign up for Cloudflare, the way in which it typically works is that we handle DNS view. And in certain cases, it might be through a CNAME, but we're the one giving out the IP address. And Cloudflare has very large IP ranges, which are published at Cloudflare.com slash IPs, which will give you the full set of ranges that we use. But essentially, any of the 27 million websites, Internet properties, whatever, on Cloudflare would have an IP address provided by Cloudflare. And it's fairly easy to see there was a Cloudflare IP and know that it was one of our customers. Over time, some of our customers have asked to use their own IP. So some of the larger customers already have IP ranges that they own and control, and they don't want necessarily to use Cloudflare IP ranges. There are a number of reasons for this. One might be branding. Another one might be that they've given out those IPs to their customers, and their customers have hard -coded them or put them in their DNS systems. And it's difficult for the company to actually go back to all their customers and change the IP addresses they're using. And as hard as it is to believe, there are systems out there that operate on IP address rather than DNS. So there are actually, for example, industrial control systems that you type in an IP, you don't type in a DNS name. And so it can actually be really hard for some of our customers to go back to their customers and say, change the IP because we're moving to Cloudflare. And so bringing your own IP is a way of allowing us to be our customer. And what that means is if the customer has a range of IP addresses that they control, but they would like us to handle that for them, we can do it at the IP level. And this now applies to everything. So our classic Layer 7 services with the CDN, with caching, with WAF, with DDoS, all those things can be behind an IP that isn't Cloudflare. But also Spectrum, which is our product that allows anyone to proxy traffic that isn't HTTP. So any port, basically, any port can be proxied to us, be it a TCP or a UDP protocol. And then there's Magic Transit, which allows us to do it at an even lower level, which is to move IP traffic through us. Now, all of these now are supported by bringing your own IP. So if you are a large customer with a range of IP addresses you want to bring, then you can bring them to Cloudflare. And we'll just scroll down in here and just give you sort of an idea. You bring your prefix. If you own 192.0.2 .0.24, you bring it to Cloudflare. We will announce it from our routers globally, just as we do for everything else. You can choose where. It can be everywhere or in specific countries, specific regions. And then the traffic will come to us, and then we can pass it back on to the real website behind it. So that was bring your own IP. And the website, the blog post takes you through the different ways in which it works. But basically, what happens is you have to have what's called a letter of authorization, which details your ownership of those IPs and the fact that you want Cloudflare to announce it. And then we can then work with the tier one networks around the world to make sure it is announceable and start announcing it from our network. So that was the announcement number five, essentially, this week. But there was more. So let me just go back to the blog. Yesterday's announcement was a really interesting one, and it's about the problem of cold starts in serverless. So often when you think about serverless things like Lambda at Edge or Lambda and other service providers, there's a concept of a cold start or a warm start. And a cold start means that a request comes in from outside to execute a particular function, and the code for that function is not available to run. And so the system, Lambda, workers, whatever it is, has to go get the code and load it into memory and get it ready to execute and then start executing. In a sense, that code was cold. It was on disk somewhere. And this delay can be really big. It can be many, many hundreds of milliseconds to seconds of delay while that code is got, pulled from disk, got ready to execute, containers are put in place, all that kind of stuff. Now, Cloudflare has always had really fast cold starts, something like five milliseconds to get going, which is very, very quick because of the way in which we use the V8 engine and its isolate technology and the stuff we built around it to make that possible. But what you find in serverless things is that people worry about this variability between a cold start and a warm start because it's not possible to know which server you're going to hit, a customer is going to hit, and whether the code is going to be warm or cold, ready to go or not. And that creates a horrible variability. And, in fact, what people do to get to work around this is something even more ugly, which is they synthetically make requests to their own code running on serverless platforms to keep it warm. They call this pre-warming. And they have to do this all over the world to make sure the code is running everywhere. And, of course, they end up paying for this because the service provider, be it Amazon, Ask, Google, whatever, is charging when those requests are happening to pre-warm the code. So we thought this was a ridiculous situation, and really this distinction between cold start and warm start needed to go away. And so yesterday what we announced was we completely eliminate cold starts for Cloudflare Workers. And how do we do it? Well, fundamentally the idea is that we have a hint that something's going to happen in TLS. So if you remember that, TLS is the protocol that underlies HTTPS, and HTTPS is how we do secure things on the web. The very first thing that happens in an HTTPS connection is that a message comes from your browser, the app running on your phone, or whatever, to Cloudflare saying, I want to start doing a secure connection with you. It's called a client hello. I want to start doing it with these parameters, and this is the website or the domain that I'm going to be asking you about, and that's called the server name identification, SNI. And then the server replies and says, okay, I'll agree to work with you. Here's the certificate. So this is back and forth. It's back and forth, the client hello, server hello, key exchange. And then once that is done, that means we've established a secure channel, and the client can send the request to say, okay, do this particular thing. And when that request comes in, that's when you hit the cold or warm start situation. And if it's cold, the code has to be loaded from somewhere, and even though it's fast on Cloudflare, it's still, you know, single-digit milliseconds. What we realized was that we had a hint about the code we were going to need to execute the very moment the client hello comes, because when the client hello comes and says, hi, I'm a client. I want to go to something on example.com, we can guess that the worker code we need to execute. So we actually do the cold start while the exchange is happening, while that handshake is happening. And because that handshake takes many, many milliseconds more than the cold start, what happens is when the request comes in from the client and hits the web server, the code is already warm. There's no cold start, and it starts executing immediately. And so we put this into place. This is a cooperation between the protocols team at Cloudflare and the workers team so that the hint can come from the protocol level, go to the workers team, and actually load the relevant code and have it ready to execute the moment the request comes in. And what this does is it effectively eliminates cold start completely. We think that even the terminology cold and warm start ought to die. There's just running code, and we know how to pre-warm it and get it ready to go. This is rolled out. This is now available too, and it says it right here. Everybody. It's rolled out to everybody as an optimization for all workers, whether they're on the free plan or not. And it happens completely automatically. There's no extra charge for it. So today you're getting – there's no distinction between a cold start and a warm start on Cloudflare workers. So that was the announcement we did yesterday. Now, if you've got a lot of time to read, I really urge you, if you're going to read one blog post this week, to read this one. So Kenton Vardar, who is one of the lead engineers and the original designer of Cloudflare Workers, wrote a very long blog post about the security of Cloudflare Workers and how we think about it. Because whenever you're running someone else's code on your system, which is what we're doing, right? A huge number of people are writing code, deploying it onto Cloudflare servers. You have to worry about the security of that and how do you make sure that worker code can't interfere, can't get access to data it shouldn't get access to, can't crash the server, can't create a DDoS. And so Kenton wrote a very long – it's about 5,000-word blog post about mitigating the security threats that we have in the platform. And it's really interesting and worth reading. But I'll just take you through a few of the sort of highlights of that blog so you can get a sense for it. So first of all, if you look at this architectural diagram, there's a few interesting things going on here. So we use the Chromium V8 engine to run JavaScript and to run WASM and anything that's compiled to either of those, if it's transpiled to JavaScript or compiled to WASM, it's running in V8. And V8 has a concept of an isolate, which is an isolated execution context for code, which itself protects what the code has access to. We run our code, the worker's code, within those isolates, and that runs within an instance of the Chromium engine. And Chromium V8 is running within a sandbox that we put in place. So there's an outer sandbox we refer to it as, which is mostly based on using seccomp on Linux to control what syscalls that the entire Chromium thing has access to. And we've made a number of modifications. So, for example, the sandbox does not have access to the file system at all. We don't need file system access. And it has access only to network via proxies. So we have control over that environment. So this gives us the general architecture. But then there are a number of things we do with isolates, which are interesting. And this is really where Kenton's blog post goes into a lot of detail. So we have this area which allows us to decide on the mapping between isolates and processes. So, in general, if you use V8, you'd just be running in one process and you have multiple isolates. So in your browser when you're using JavaScript. But within our context, we can move things around. So, for example, we can decide that, let's say, a new piece of code coming from a free customer must never run in the same process space as code from an enterprise customer has been with us a long time. So we're able to separate out different classes of code in different ways. But we're also able to look at whether we think we trust the code or not. So we're able to assign a trust level to what the code does and decide whether it really needs its own isolate or whether it needs its own process. And that allows us to protect at a higher level. And then we can move stuff around dynamically. So, for example, if we see an isolate is using a lot of CPU and we think, well, maybe it's just using a lot of CPU or maybe actually it's doing something nefarious, we can actually retarget that into its own process and give it an extra wrapper of protection. So there's a lot of stuff that's been done building on top of the V8 engine here to give us control over isolates, processes, and then the entire outer process. And you can really read all about that in Kenton's blog. The other thing in here is a whole discussion about Spectre. Let me just go down to the – oh, so let's just talk about V8 bugs. So if you think about V8, it is a very highly attacked piece of software, right? It's running in the Chrome browser. So it's something that people really want to attack and try and break into because they'd love to break into browsers. And so what happens is you have an active community trying to break it and you have Google has a massive fuzzing infrastructure looking for bugs in it. So they put out fairly regular patches to V8. And if you look at this, there's a concentrated patch gap, which is that Google actually between in the open source project announcing that they fixed a bug and it getting into the Chromium browser, it's about 15 days. It's quite a long period before it gets into Chrome and it's out there. What we did is we decided to automate this so that if V8 pushes out a new fix for a problem, like a security problem, then we can get it into production very, very quickly and, you know, under 24 hours and actually typically much less than that. So we're constantly being able to update and keep on top of patches to the V8 engine. The thing we often get asked about is Spectre. So Spectre is this interesting attack on CPUs where you trick a CPU into doing some work with data it shouldn't have access to. And when it then fails, you have some way of getting access to what it operated on. So you get a little bit of signal, like did you see a zero or a one or was that byte FF or was it 00? And usually you do that through timing information. And so we actually thought about this way before Spectre early on in the development of workers where we restricted access to anything that allows timing. And within workers, you can go through this in Kendra's blog post. First of all, we don't give people access to timing information within a worker. But then we can do things to actually mitigate how Spectre might happen. So I talked a little bit about the process isolation. You should definitely do that. But our big goal here is to say if an attack is found, then we want to slow it down. And so we have all sorts of ways we can do this. So some of the basic things are, first of all, we don't allow native code. So you've got to write JavaScript for WASM. And that means we can stay away from any of the interesting timer things you can do in the CPU and interesting ways you can actually really tickle the Spectre attack in the CPU. We don't have timers. We don't allow multi -threading. Any worker is single-threaded. And things like in JavaScript date.now, they're actually fixed. They don't change. So you can't do timing within an actual piece of code. So that allows us to remove timing completely. We actually implemented this long before Spectre because we were worried about what might happen. We also don't allow you to run multiple threads of the same worker. So you can't actually use different threads to achieve a timer. And then we go deeper. So I talked a little bit about this, but the dynamic process isolation I talked about before, which is if we see an isolator that we think is behaving in a way that's doing something unusual, then we can move it into its own process and then push off mitigation down into the kernel. And so we're able to do that. We can reschedule at any time. And we do. We also worked with the team at Grouse Technical University. That's the team that discovered Spectre to look for attacks in our system. And that's one of the ways we came up with this idea of dynamic isolation is to be able to do that. And we can detect very quickly if a particular worker looks like it's doing something that we are suspicious of and actually move it out into its own process. And that allows us to go into the process space. Of course, processes don't necessarily provide the ultimate protection because they're operating system level things you have to worry about. And so as we slow down the speed of attacks, we're now able to say, okay, well, let's suppose we've now slowed the attack down to take a few hours and then we might want to shuffle memory every few hours or a few days. And so we can actually do that by restarting the entire process space for the V8 engine and actually the worker's runtime so that it makes it very, very, very difficult to filter all these attacks. So worth reading 5,000 words on worker security. It'll give you a sense for all of the work that we're doing. And I think one of the really important things is that we are, it's a continuous effort. It's not done and finished. We're constantly looking at what we need to do to improve the security of workers. And I think you'll see in that report a good way we're thinking. And obviously we work with outside experts too. All right. So that was on Wednesday. On Tuesday we announced this broad language support. So natively we support JavaScript and WASM as actually things we execute on the machine. But anything that can be targeted into those things can be actually executed. So we see, you know, Rust, C, C++, all the things that can be compiled into WASM being executed. And then, you know, things that can be compiled down into JavaScript can actually be executed. So this blog post was about supporting Python, Scala, Kotlin, Reason, and Dart, and gave you some examples on how to do it. And, you know, we have this really nice tool Wrangler, which allows you to write code on the edge and debug it and get hold of logs. And it now has some really nice tools for generating projects in different languages. So this was an example of generating a Python project just to do hello world. And you can go do that. And then if you have a look, we also have a Scala example again using Wrangler generate, and you just Wrangler publish it and it becomes live code running on the edge. And, you know, similarly for other languages. And there's a larger example in here of actually developing in Scala an application just to show you how you might build an API. So we should think about workers, although the foundation is the V8 engine, as we talked about because of the incredible power of WASM, because of the power of transpiling, you can pretty much pick the language you want to. And as you know, in the past, we somewhat jokingly got to do COBOL working and actually ran COBOL code on the edge. So if you want to play around with any of those languages, you absolutely can. All right. And then there was, let's just go to the blog. There was an announcement on Monday about something called workers unbound. So we've now split the workers product into two products. It's called workers bundled, which is what you're getting today and workers unbound. And workers unbound is about removing restrictive CPU limits. So as you may know, workers or workers bundled is now called has a restriction on CPU time, tens of milliseconds of CPU time. It's designed to run things that are relatively small. And we came under increasing pressure to have large projects running on the edge on surface because people really love the automatic scaling. They love the way in which it's easy for them just to deploy some code and it's running everywhere. They love Wrangler so they can debug stuff live, but they wanted more than 50 milliseconds. So we went out and we thought, well, we're going to do this. This is workers unbound. And it allows you to run a much larger application. So again, it's still using the same isolate model, but we allow much, much longer execution. And here's the big thing. If you look at the pricing for workers bundled, it's very simple pricing. But what we've done here on workers unbound, because we're now dealing with large amounts of CPU, is we're now charging in a similar manner that you would see in other serverless platforms. So this is just for workers unbound, workers bundled, nothing changes, no change to what's there. And you'll see it's much, much cheaper than equivalent things, much cheaper than Lambda, much cheaper than Lambda or Edge. And you're getting an incredible performance of 200 locations around the world. So it's very fast. This gives you some sense of how fast it is. Remember, we announced this on Monday. We hadn't actually yet announced the zero milliseconds cold start. So this is all around looking at that warm stuff. And so you can see what this looks like. We're faster, we're cheaper than all of the equivalent platforms. It's not available as a general availability yet. So this is something that is in beta. You can use it, but you have to sign up. And there's a simple signup form here. Go ahead and do that. If you have a larger project you want to run on Cloudflare, in our serverless Cloudflare Workers platform, you can go ahead and do that. So for announcements for serverless week, workers unbound on Monday, multiple languages support, choose the language you want to use, use it on workers on Tuesday. A real deep dive on what we're doing around workers security on Wednesday. Yesterday, it was zero milliseconds cold starts and how we achieved that by racing, basically, once TLS tells us where we're going, we can actually do zero milliseconds. Another announcement today in about two hours, 10 minutes, which is written by Matthew Prince. And that was on Sunday. And this is a long read, too. And this is a long read without being deeply technical. But it has an interesting thought in it, which is that, you know, although we've talked a lot this week about the speed of Cloudflare Workers and how we want people not to think about cold starts and warm starts because they're weirdly, you know, it's a weird variation. Matthew's idea here is that there's actually a hierarchy of needs. And that, in fact, the speed of the platform is not in fact the one that's really driving developers to use it. So if you look at his hierarchy of needs, he breaks it down like this. He says that, you know, the big thing he believes that's going to drive people is compliance. And that might seem really weird. Why compliance? Do you think of compliance as something that, you know, a compliance group has to deal with and you have to worry about it and decide what you're going to do? Well, the problem with the Internet right now is that the Internet is very slowly dividing up into different areas with Brazil, India, the EU, China, Russia, all these places are saying, we want to have some control over the Internet. And this is driving needs of real developers to say, okay, I need to know where my code is executing. I need to know where my security keys are. I need to know where my data is. And when we think about this at Cloudflare, because we have 200 locations around the world, we're able to do very fine grained breakdowns for people about where their code should execute, where the data should be, where their TLS keys are. And so we believe that this compliance, which might be as simple as a bank saying everything has to stay in the EU or somebody in Brazil saying this is sensitive. It needs to be in Brazil. And this will drive actually serverless platforms because the service platforms have the scope to do this. And it's very hard to do it if you're building massive data centers in just a few locations. And then we get to the ease of use, right? Developers care about ease of use. I mean, I wrote a blog post a while ago about called free to code, which is like, you know, in the eighties, I had a home computer. You'd switch it on. You could immediately type code and execute it. It was in basic, but you could write code. And realistically it's what developers want. They want ease of use. They want ease of use in terms of deployment. They want to be able to write code. They want to be able to debug code easily, get log access easily. And I think serverless platforms can really, can really enable that. And we'll do that. We do that through the Wrangler tool. And more about that today, in our blog post today. And then there's cost, the cost of these platforms. If it meets your compliance needs, if it's easy to use, then you start to say, well, okay, well, how, what does it cost? And obviously we believe we are much cheaper than the alternatives, but cost is not necessarily the driver to take you to serverless. It's actually more likely to be how, what can I get done? That's easy to use. And where can I get it done? That's compliance. But cost is important. And then as we go down, the bottom one is speed. And speed is the, you know, we're closer to everybody in the world. And it's important. And, you know, when you're, when you're checking something off on your list of things, performance is going to matter. And it's tied up with consistency. More interesting, which is easy views and compliance. And I think if you look at what we have in place, you'll find that we have the solutions for those things. But, you know, give it a read. Matthew, obviously, has written something which I think tells you a lot about what we're, what we're thinking, you know, and takes us through some of the history of the platform. And then the fact that, you know, he would like it, he would like it to support Perl. We don't currently support Perl, I don't believe, but that could certainly happen. And, you know, takes us down to what we've got. So we hope that you will build stuff. So that was, that's serverless week. There are, you know, a lot of blogs coming out, a lot of announcements coming out. There'll be another one today in a couple of hours. And over the weekend, you'll hear it. There's a couple more blogs will be about other things that are happening in Cloudflare. Obviously the rest of Cloudflare hasn't slowed down. This is a big investment in the serverless platform, but we're also looking at the rest of things. So we're looking at, you know, bring your own IPs, which we got, which we announced yesterday. And there are a number of other things that will come out. Next week, as we continue to enhance the entire platform. So that's it for this week in there. I hope you've enjoyed playing, watching and, you know, stay tuned for more stuff on the blog. I'm John Graham Covey. I'm just going to check to see if anybody sent in any questions. I'm going to have a look in the questions room. Well, I would, if Google chat had not decided to steal my entire CPU and fail. Okay. Well, I unfortunately can't see if they're, Oh, you know what? I can look on my phone or do it live. Let's do this. Let's see. All right. Are there any questions? Doesn't look like it. Okay. Well, given there are no questions, look, I look forward to seeing you all read the blog post at 2 PM and have a good day. Okay. Bye. What is a WAF? A WAF is a security system that uses a set of rules to filter and monitor HTTP traffic between web applications and the Internet. Just as a toll booth allows paying customers to drive across the toll road and prevents non-paying customers from accessing the roadway, network traffic must pass through a firewall before it is allowed to reach the server. WAFs use adaptable policies to defend vulnerabilities in a web application, allowing for easy policy modification and faster responses to new attack vectors. By quickly adjusting their policies to address new threats, WAFs protect against cyber attacks like cross -site forgery, file inclusion, cross-site scripting, and SQL injection.