This Week in Net: Cloudflare's farewell to Railgun (and its relevance)
Presented by: John Graham-Cumming, João Tomé
Originally aired on February 20, 2024 @ 5:30 AM - 6:00 AM EST
This Week in Net: Cloudflare's farewell to Railgun (and its relevance)
Welcome to our weekly review of stories from our blog and other sources, covering a range of topics from product announcements, tools and features to disruptions on the Internet. João Tomé is joined by our CTO, John Graham-Cumming.
In this week's program, we discuss our CEO, Matthew Prince, visiting Lisbon, Portugal (which includes a picnic). We delve into the Regional Tiered Cache, which offers an additional layer of caching for Enterprise customers with a global traffic footprint, enabling faster content delivery by reducing network latency.
However, the main focus is bidding farewell to Railgun, a web optimization technique launched by Cloudflare ten years ago. We explore why it was developed by John Graham-Cumming, its relevance, and the decision to deprecate it in January 2024.
Lastly, we provide a guide on how to perform dynamic data collection using Zaraz Worker Variables (Cloudflare Zaraz being our server-side third-party manager designed for speed, privacy, and security).
Hello and welcome to This Week in Net, everyone. It's the June 2nd, 2023 edition. And although we're not discussing why happiness is a warm gun, according to John Lennon, we are going to talk about something called Railgun.
I'm João Tomé, based in Lisbon, and with me I have, as usual, our CTO, John Graham-Cumming.
Hello, John, how are you?
I'm good, thank you, João. Nice to see you again. And yeah, we're going to talk about Railgun, which is the very first thing I worked on at Cloudflare when I joined the company, which is kind of wild to think about that now.
But I tell you what, I can't believe it's June. How did June happen? Exactly. We're already in June, 2023.
I feel like May was missing. I feel the same thing. Time seems to fly by.
It seems to be flying by, yeah, absolutely. Well, good, yeah.
So you want to talk about Railgun, the first thing? Sure, sure. But before we go there, let's talk about the fact that Matthew Prince, our Last week he was in London, in our Connect event, and now in Lisbon.
He's in Lisbon, yeah.
He came to visit the office here because we've got a lot of people. I think we're getting 250 people, something like that now, in Portugal, most of them in Lisbon, although some folks in Porto, some folks in...
I think we have somebody in Bragança, I think we have somebody in the Algarve.
I mean, we have people all over Portugal.
So yeah, there's a lot of people here and Matthew came to visit and we had a picnic.
So if you're watching this and you would like to work for Cloudflare in Portugal, we are hiring.
So go to Cloudflare careers. Take a look.
We have definitely lots of openings and we're growing very rapidly. So yeah, we'd love to have you.
Exactly.
And like you were saying, people can work for Cloudflare, not only in the Lisbon office, but also elsewhere in Portugal.
That's right. It depends a little bit on the role, depending on whether it's one where we want to have it in the office versus it's remote friendly.
But yeah, there's lots of different options at Cloudflare now.
Exactly. I will show true video editing, some of the images of the picnic and all of those things.
Matthew shared on Twitter also images.
That didn't work.
However. And you did a fireside chat with him, an internal one.
But are you surprised at some of the things you still discover about Matthew when you do those type of conversations?
No, I'm not surprised by the things I discover about him because I've known him for about 20 years at this point.
So it's been a while. But I always learn something from him.
So one of the things I always do is I try to watch his videos and his talks and his press, stuff like that, because it's interesting to learn from him.
And I think as a wider thing in Cloudflare, I think that culture of learning is pretty important to us, trying to learn from others and really understand what we're doing and how people do things.
I think that's been a big, a very powerful thing for Cloudflare.
So yeah, I always have a listen to what he has to say. Yeah. And one of the things that I noticed is the fact that you know each other for so long and Cloudflare is only a part of that.
The knowledge is much deeper than Cloudflare. Right?
Well, I mean, you know, Matthew has an unusual background, right? Because he has a degree in English, a minor in computer science, a law degree and an MBA.
So, you know, he can't pull the wool over his eyes on most subjects.
Exactly. So let's start with some of the blogs we had this week, three blog posts this week.
Before we go to Railgun, why not starting with a reduced latency and increased cache hits with a regional tiered cache.
What is this all about? Let me share my screen. Yeah.
So, well, so take it back to zero. Cloudflare operates a cache that we store stuff, images, web pages, whatever you want us to cache across our network, which is in 285 cities worldwide.
And the interesting question is how does stuff get into cache?
And so what actually happens is if somebody requests something that isn't in cache, we'll go back to the customer's origin server, their web server, and get the thing and then hang on to it in cache.
And if you have customers all over the world, then what can happen is you can have those requests coming in from all over the world.
So we introduced a thing called tiered cache. And the idea of tiered cache is that we can have levels of caching.
So maybe, you know, a little data center like the one in Lisbon, which is relatively small compared to the rest of the world.
So I go to some website that uses Cloudflare and I hit the Lisbon data center because here I am in Lisbon, right?
And we don't have that thing in cache in Lisbon.
Or rather than going all the way back to the origin server, we can go and ask a bigger caching system somewhere in Europe, maybe you've got a copy of this web page or you've got a copy of this image.
So it might be that you would go to, say, Paris or London, and the customer can configure that.
They can actually configure it in such a way that only certain Cloudflare data centers will talk to their server.
And this increases cache hits, it decreases latency, and it decreases load on the origin server.
So that's great. And that's been around for a long time.
And then we introduced this new thing, which is regional tiered cache, where our data centers can go from one data center to another, kind of in region, sort of locally and say, hey, have you got this item?
So it's another level of caching and another level of taking load off of the origin server.
And so, you know, we brought this out as part of our tiered caching functionality.
And you can see some diagrams here about what I just described, which is like the lower and the upper tiers, where you had the ones we call the upper tiers, maybe the big data centers that talk to the origin server and the lower tiers being the rest of the world.
And now you have like sort of the best of both worlds, which is like this, the regional area, which is like, go and try it out locally to see if there's a copy somewhere else.
So like, you know, you imagine we're here in Lisbon and some news story breaks in Spain, and maybe everyone in Spain has already gone to that news story.
And it's looked at the image of, I don't know, the King of, Queen of Spain are doing something.
And then in Portugal, some people say, oh, I'm interested in that too.
And it's not available in the Lisbon data center. Well, we can pop over to Madrid and say, hey, give me that image before having to ever go to the origin.
So that's what this is all about. And there's, you know, customers can set up the topology they want for this and how it operates.
And I think this is just another way of decreasing load on the origin server, decreasing latency and increasing cache hits, because we've probably got this, the thing you want somewhere in our network.
And so this helps with that. So it's a new piece of functionality.
It's available. You know, if you're a customer who's using our CDN, who's really concerned about any of the, you know, latency or cache hits or origin load, here we are, we can help even further.
Exactly. And it must be enabled in a sense, right?
But yeah, this is a feature. This is not something we do completely automatically, right?
This is, this is the, this is a feature that, you know, it's for the, it's for the customers who really want control over their, over their cache.
Exactly. But it has all of those benefits. You mentioned enterprise customers can enable it via the Cloudflare dashboard.
And here's an image actually of that.
That's actually the dashboard. Yeah. Where it sort of shows how this, how this kind of operates.
So yeah. Yeah. And depending on what sort of plan you want, you can either, you know, there are different things where you can actually set up the configuration with a custom cache topology.
If you really want to do something specific or you can let us do it, or you can just, you know, there are different options about how we decide which cache service to use.
Exactly. A lot to unpack here in terms of a new functionality, in a sense, to reduce that latency that everyone doesn't like the latency that is.
So why not go to Railgun already?
Yay. Okay. Railgun. I have this image. We have this image. Yes. Yes. I love this image.
And can you first describe why, what is Railgun? For those who don't know.
I didn't want to call it Railgun. Okay. So you want some Cloudflare history.
I wanted to call it Rocket Sled. Okay. Because I was like, that was what it was because, because the idea was that, so when I joined Cloudflare, Matthew gave me a slip of paper and on that slip of paper were a list of things that needed doing, a code that needed writing.
And one of the things on the list was figuring out how to optimize the latency between a Cloudflare data center and the origin server, because on a cache miss, so this fits in really nicely to what we were just talking about, which is like, if you, if we don't have it in cache, we have to go back to the origin server.
And that latency could be long depending on where the data center is and where the origin server is.
And we want to absolutely minimize that latency. And so what do you do?
Well, one of the things you do is you grow your network, like Cloudflare did, you introduced tiered caching and you do all this kind of stuff.
But the other thing is to look at that connection between the two.
And the idea of this was to figure out some mechanism by which we could, we could make it, you know, make it much lower latency.
And so I, this was my first, first job. And this was prior to HTTP 2 existing.
So remember we're in an HTTP 1.1 world, TLS, SSL at the time.
And the idea I came up with is that for many webpages, we're particularly concerned about webpages here because they're often not cacheable because maybe you're logged in or maybe they're updated dynamically.
The actual change in the webpage between refreshing, if like, if you go to a webpage and you hit refresh, get the next version, you might actually discover it's only changed a very small amount.
For example, I remember looking at the BBC News website and when you hit refresh, unless there was a breaking news story between accesses, the only thing that changed was the date and time.
They updated a little tiny thing. So I was like, well, look, if you could only send the change, which we call a delta, then you could send, you can get a crazy compression ratio because you send like this tiny change.
And so this, what this Railgun thing did and does is you have something on the server and something on Cloudflare and they coordinate what's the latest copy of a specific webpage they've got in their local cache.
And then so what will happen is when you need to get some webpage through Cloudflare that's not in cache, it goes to Railgun and Railgun sends to the other Railgun, I need this webpage and I've got version blah.
And the version was actually just a hash. And on the other end, if it's got the same version in its local cache, it goes and gets webpage, does the diff, does the delta and just sends over the delta.
And on the other side, we reconstruct it.
We go, oh, just this little change has happened.
And so that was the first thing I worked on. There were some attempts to do this actual type of compression in web browsers, interestingly enough, a thing called SDCH, which was going to use this kind of differential compression.
But we actually implemented it in Railgun.
And that was, I'm going to say that was 11 years ago.
And the compression ratios are completely crazy when you do this stuff. And it's been a great, great part of our product suite, but we are very different to how we were in the early days.
When I was doing this, we probably had a handful of data centers.
And so those latencies were really long. We didn't have tier caching.
We didn't have Argo smart routing, which would optimize the path across the world.
We didn't have a fiber optic backbone that we manage ourselves. And so this technology really mattered a lot.
And lastly, we didn't have Cloudflare tunnel.
And Cloudflare tunnel makes that connection from origin server to Cloudflare.
And so I think that it's time to say goodbye to Railgun because we've taken the ideas in Railgun, the performance improvements, and we put it across the entire network.
So as sad as I am to say goodbye to something I wrote, I think one of the great things that Cloudflare has done over time is say that some things are no longer the right technology and replace them.
Some of them, in this case, it's an external product.
In other cases, it's something which is internal.
So we've rewritten so many parts of our system as we've grown as things have changed to take into account the changing landscape.
So I think Railgun is going away next January when I think it will probably be 12 years old.
And so it's probably a good time to let it go.
But that was my first Cloudflare product. That's very interesting.
So you were at Cloudflare for a few months, weeks, when you did this?
Well, Matthew told me that he gave me the list of things that he would like worked on in the first week, actually.
I flew out to California. He gave me the list.
I got on BART to go back to the airport in San Francisco. And I sat on BART.
I looked at this list, and I went down the list. And I was like, that compression or latency thing seems like a kind of interesting thing.
Of course, not long after this, you start to get HTTP2.
HTTP2 has some characteristics that Railgun had.
Railgun was managing multiple. So one of the things it did was it kept a TCP connection open to the server.
And it multiplexed requests over that connection.
So you don't get the TCP slow start problem. You solve all these kind of stuff.
HTTP2 in some ways replaces some of the things that Railgun was doing, including actually some of the sort of compression technologies.
So the world moved on.
And I think Railgun has reached the end of its life and has been replaced by so much more in our network.
I have a question. Actually, I asked Shachipiti to explain to me.
That's true. To explain to me what is Railgun. You could ask me, right?
You could ask me. True. True. True. And I already knew a little bit, but I wanted to see what the system would say.
And I think mostly it got the information from Wikipedia.
So mostly it's Wikipedia. But it did a sum up in terms of I asked how successful Railgun was.
And Railgun has been successful in improving website performance and speed for websites utilizing Cloudflare services.
It has had a positive impact on the Internet by reducing bandwidth usage and enhancing the delivery of dynamic content, resulting in faster load times, improved user experiences for websites using Railgun with Cloudflare CDN.
However, it's important to note that the impact is specific websites that have implemented Railgun in conjunction with Cloudflare's infrastructure.
So that's the sum up that Shachipiti gave in terms of Railgun.
Well, thanks. I guess. I mean, I guess so. Yeah. Yeah.
I mean, I think that's pretty accurate. I thought at the beginning it was kind of a lot of flattery, in fact.
Right. So I thought Shachipiti was trying to button me up or something.
But yeah, that's a pretty good summary. Shachipiti usually does that.
He likes to please humans. I already know that by using it.
And here is the blog post Samaria wrote. Cloudflare is deprecating Railgun, explaining when was it launched?
Why was it launched 10 years ago? What is it?
You explained already a bit. And also, why deprecate that product in specific? I'm kind of interested to know, to understand a little bit of some of these products like Railgun are very important.
And it's a specific moment in time. But also, this shows us how technology evolves, the Internet evolves.
And what was needed and important for a company at a specific time, then there are new things doing the same thing, possibly better because they're newer.
That's also the process of doing technology, right?
It is. I think in our case, I mean, the big difference, I suppose, is that we have just reached a level of scale that just didn't exist when Railgun was around.
So Railgun had to operate when there might be very long latency between us and the origin server.
It had to operate when there was no optimization of traffic moving around it beyond any cast, going to your nearest data center.
And I think that just the architecture of what we do has changed so much, right?
I mean, look, we have a data center here in Lisbon. You and I talked, the Lisbon data center is milliseconds away from us.
That data center is then using Argo smart routing to get to the correct, to work around latency problems, packet loss in the Internet.
If it needs to go somewhere far away, it may well go over our private fiber backbone.
There's all sorts of stuff that's happened to our network since then.
So yeah, this is a lovely piece of technology and I'm sorry to say goodbye to it, but it's also makes sense to go because our network has changed so much.
And frankly, Cloudflare Tunnel is a very, very important product. And in some ways, once you take Tunnel and you take all of the optimization we've done in our network, particularly to scale, you don't really need Railgun anymore.
So we're going to say goodbye to it next January. Exactly. Another thing you didn't answer to me was why Railgun?
You said you were thinking of another one. It was not your main name, but why Railgun?
I'm pretty sure that Matthew thought of the name Railgun.
I think I have a memory of sitting in the San Francisco office and I said, I wanted to call it Rocketsled.
And I think it was him who said, no, no, no, no.
Railgun's a way better name. And actually that picture you have of the Railgun toy there, we actually gave away these big plastic water pistols, huge ones, with a Railgun sticker on them at some hosting conferences.
And I have a good memory of sitting in the San Francisco office, unboxing them, sticking stickers on them, putting them back the box.
I don't remember how many hundreds we did.
And there's actually one other funny history story. You see it says 2408 on there?
That has a meaning, right? Right. It's the port number that Railgun was using.
So this is the TCP port number it was using. And the reason it's 2408 is way back in 1996, I was working at a company called Optimal Networks, which was doing network optimization software.
And we had a product that needed an assigned port. And there is this list of assigned ports from IANA.
And I asked IANA for a port. So like HTTP is port 80 and HTTPS is port 443.
Well, there was this protocol called OptimaNet, which I created.
And I got assigned this port 2408, all those years ago in 1996.
And what happened was that company is defunct. So the port was sort of reserved for something that was never going to come back.
And we managed, I managed to ask IANA and they were like, yeah, sure, your name is on it.
We'll kind of hand you that port.
So that for a long time was kind of John's assigned TCP port was 2408.
Your own port. My own kind of TCP port. And so we used it for Railgun. I think now if you look in the IANA assigned ports list, it says 2408.
Of course, assigned ports are not so important anymore.
We have lots of ways to discover, you know, service discovery.
But, you know, back in 1996, it certainly seemed important.
And it was necessary for Railgun because people wanted to have punch a hole in their firewall and use Railgun with, you know, with an assigned port.
And there's a bit of John Graham coming in this image because of that, because of that port.
There you go. Also, that's really interesting. Oh, by the way, I forgot to mention Railgun, for those who don't know, are advanced electromagnetic weapons that use powerful magnetic forces to prepare projectiles at incredibly high speeds, but they're not in use.
Like, I think this is like a technology that was never well succeeded in a sense.
You know, I'm hardly an expert on actual Railguns. I know that I remember thinking that I didn't really like the name because it was a weapon.
I didn't think of what I was doing was a weapon. And I liked rockets led better because it sounds like you're sitting in a rocket, which is shooting on the ground at some crazy speed.
But you know what? The name is there now. The product's reached the end of its life.
So goodbye, Railgun. Thanks for, you know, thanks for being an interesting project to work on, both like from an algorithm perspective.
And also it's the program I really cut my teeth with Go on as well. That was the first thing in Cloudflare written in Go.
That was, I decided to do it there because it needed lots of concurrency and networking, and Go is really great for that.
Yeah. A bit of Cloudflare history there. And also customers using that have Railgun will receive an email notice later.
And I got one.
I'm a customer who use Railgun. So I got the email yesterday. Makes sense. CTO that is also a user.
Oh yeah. Yeah. I have multiple paid Cloudflare accounts. It's, I like using it as a user.
I'm not sure people internally necessarily like me using it as a customer because I tend to find bugs and try people crazy.
I can see that.
More things. This week, we also, just today, actually this Friday, have had a blog post about Zara's worker variables, dynamic data collection.
What is this all about?
So first of all, you're going to need to know what Zara's is. So Zara's is, websites have a load of JavaScript in them.
Some of that's for interactivity, right?
You're searching on the page. Something's really nice and interactive, but some of it is third party.
So it might be something like marketing, ads, a chat bot.
I mean, all sorts of things get added in as third party stuff. And fundamentally that third party stuff is someone else's, but you've injected it into your website.
And that has all sorts of security implications.
It has all sorts of performance implications because you put a whole load of stuff in there.
Well, guess what?
Cloudflare has a platform, Cloudflare Workers, where you can run JavaScript and other stuff inside our network rather than inside the browser.
So Zara's allows you to take all that third party stuff, run it on our servers, and just have a little bit of communication with the real website.
And that's much safer.
It's quicker. It's easier to update. There's all sorts of reasons why this is really good.
However, in the real world, there are also sometimes needs to configure those third party JavaScripts with bits of information that are specific to them or need to be shared across multiple of them.
And so they give some examples here of like, there are sort of functions that need to be added, variables that need to be set up, and some of these things.
And what this new feature does is it allows us to use variables with worker variables to do that within the Zara's code.
And so if you have something, you're using some piece of third party JavaScript, it needs to be configured in some specific way, then you can do it here.
And I think that allows you to use this wider range of use cases.
There's a lot of examples here, benefits also in terms of Cloudflare Zara's worker variables, in terms of build with context, speed, isolated environment.
There's a lot to unpack here if you need these types of tools to add stuff to the things you're already using.
Yes, absolutely. I mean, it's something where you can get in there and you can configure things in a way that what you would have previously done.
They give an example like there might be a page name attribute, which needs to be made available.
Well, you can now do that within the Zara's stuff using workers variables.
So yeah, it really makes it much more configurable and really allows you to use it for many types of third party stuff and really integrate it into like how you do it.
Because lots of websites have a ton of this stuff in their web pages, tons of JavaScript, all having to be configured in certain ways.
And there's also a description here on how can you start using it, create a worker, there's steps in terms of Cloudflare dashboard.
So again, a blog post also helpful in terms of following some steps to do it, right?
Absolutely. This is a technical post where you can get information about how to do it and how to set it up.
In terms of blog posts for this week, that's it.
I asked ChatGPT also some... No, you can still talk to me about it as well, sorry.
I can, but actually I like doing both because it makes things a little bit more interesting.
I asked ChatGPT one thing that also shows us how sometimes unreliable the systems could be, which was, what are like fun facts related to the Internet history from this week?
In specific. And it brought a few examples from this week, but nine examples, to be honest, and it misfired on all of those examples in terms of dates.
The trend, the fun facts are true, like the Witty Worm, one of the earliest Internet worms to specifically target network infrastructure.
It's from 2004. It says it's from May 2003. There's some things there, but the one thing it got right, which is May 27th, 2003, which makes it 20 years, was the year and the month that the first version of WordPress was launched.
So WordPress, in a sense, has 20 years now.
So that's the fun fact of the week this week.
Happy birthday, WordPress, 20 years old. That's kind of amazing. And I've spent a lot of time administering WordPress websites, Jetpack, fiddling around with MySQL and all that kind of stuff.
So yeah, happy birthday and an important technology.
And of course, Cloudflare has a WordPress-specific product, Advanced Platform Optimization, which figures out how to do intelligent caching of WordPress websites, thus reducing load, lowering latency, all that kind of good stuff.
So if you're using WordPress and Cloudflare, check out APO because that can really help.
That's the end of this week. With a high note, WordPress 20 years.
Why not? There you go. Congratulations. That's a wrap. Goodbye, Railgun.
Goodbye, Railgun. Thank you. Bye -bye. Bye-bye.