All about Internet Measurement, Resilience, and Transparency

Presented by: João Tomé, Marwan Fayed

Originally aired on October 31 @ 11:00 AM - 11:30 AM EDT

In this episode, host João Tomé talks with Marwan Fayed, Principal Scientist and Research Lead at Cloudflare, about the science behind understanding and improving the Internet.

They explore the Research Week blog takeover on Measurement, Resilience, and Transparency, discussing the tricky science of Internet measurement — including a traffic spike in Ukraine that revealed how complex it is to explain data at scale. Marwan shares how Cloudflare is building a framework for Internet resilience, preparing for a post-quantum future with Merkle tree certificates, and tackling the “store now, decrypt later” risk. They also cover Cloudflare’s work to identify users behind Carrier-Grade NAT, innovations in protocol defense, anonymous credentials, and the story behind WARP VPN.

At the heart of it all: how to make the Internet safer, faster, and more transparent, at global scale.

Learn more in the blog series: https://blog.cloudflare.com/tag/research/

English

ThisWeekInNET

Transcript (Beta)

But then as we were assembling things, we realized the Internet is much more the things that don't get a lot of attention. Measurement is one of them, certainly. But also when we talk about resilience, robustness, reliability, these are not performance, they're not hot and fast. And they deserve attention too, I think as we've seen with the recent failures with Azure and AWS. And in addition, the way we interact in the Internet is changing with all the AI agents and so on, and that needs attention. But so too does privacy, because people are looking for more and more ways to know something about you, while we also want people to be more and more anonymous online. And so finding that balance can be a little bit tricky. And that's fundamentally what the week was about. Let's look at the Internet, build some trust, talk about reliability and resilience, and see where is privacy going in the future. Hello, everyone, and welcome to This Week in Net. It's Halloween day, so it's October the 31st. And this week, we're going to talk about a blog takeover, not necessarily about Halloween, but a blog takeover from our research team and friends, all about Internet measurement, resilience, and transparency. And this week, we're also going to have a second part of this episode with Andres Zuch from our Radar team, all about how we have new tools for Radar for you to explore. So stay tuned for that after this first part of the episode. I'm your host, João Tomei, based in Lisbon, Portugal. And with me, I have Marwan Fayyad, our research lead. Hello, Marwan, how are you? How are you? Good. Where are you based, for those who don't know? Most of the year, I'm in London, in the UK. Despite the accent, it is where I think of as home these days. On that note, actually, we were together in person a few weeks ago at Connect in our big event. One of the things that I got to understand there is especially how connected Coffler is in terms of the partners, industry. Were you surprised with some of the networking that took place during that event? The boring answer, honestly, is no. I don't think I was surprised at all. One, it was great to see people from across the company show up because it's being as spread around the globe as we are. It's very hard to connect person to person. You can get a lot done. But from the perspective of the partners and the customers, I think seeing the networking and the talking in the meeting, that was all great. I think what I didn't anticipate walking in was how tuned in so much of the customer and the partner population is. They really know their stuff. Ask some really insightful questions. Managed to connect dots in interesting ways. I think that was one of the most fun aspects for me. Because it's Halloween. Halloween was not a thing in Portugal a few years ago. Now it is. Actually, my kid just went to school with a costume, a zombie firefighter. Actually, I will go to the Coffler Lisbon office later today with my kid for a party, a Halloween party for kids. Do you have a Halloween story that you can share with us? Yeah, so I grew up in a part of the world where Halloween was a thing, and we went out every year. In fact, oftentimes as you were growing up into your teens, you would look for ways to find costumes that would allow you to go out into the street trick-or-treating, even though you were too old. I think my favorite Halloween memory, much younger, and a number of us were going out, and many of my friends dressed up, as we do. But one friend showed up, and I didn't quite understand his costume. He'd gone to a thrift store, secondhand shop, to buy a baby blue suit from the 1970s. And then he had an electrical cord tied around his neck, and to the other end of the cord, there was a rubber chicken. And so we said, what kind of a costume is this? It's a bit haphazard. You're just mixing things that are unrelated. And he said, come on, can't you see that I'm chicken cordon bleu? Or if you say it, it's chicken cord on blue. And so I quite literally fell to the floor laughing. I thought it was just an element of creative genius. That's a good one. By the way, where was this? This is in Toronto, Canada. Toronto, Canada. Okay, moving on. We're on our Research Week Takeover. What is the main takeaway from the week, and how did it come about, really? So one is, we call it research and friends. So really, it's important. I think the way research works at Cloudflare is we only manage to do work by working with other teams that own the features and the products. So this week, there's a professional association called the ACM, the Association of Computing Machinery. It's been around for decades. And one of the research publication venues is a conference they run called the Internet Measurement Conference, AMC. Measurement is one of these things in most domains tends to not get a lot of attention. And I think what's not obvious is that measurement from a scientific perspective is actually much more than pulling out a measuring tape. Measurement is about finding ways to gather data, curating the data, assessing the data, building models with the data that then you can use to make predictions about the world. So you may have heard that Large Hadron Collider is a great example of how we gather data informed by old models, but we're doing this in order to build new models and know about the universe. This happens across the science, and the Internet is no exception. So we do a lot of measurement at Cloudflare, some more involved than others in terms of projects. And we wanted to celebrate Internet measurement. But then as we were assembling things, we realized the Internet is much more that the things that don't get a lot of attention. Measurement is one of them, certainly. But also when we talk about resilience, robustness, reliability, these are not performance. They're not hot and fast. And they deserve attention, too, I think, as we've seen with the recent failures with Azure and AWS. And in addition, the way we interact in the Internet is changing with all the AI agents and so on, and that needs attention. But so, too, does privacy, because people are looking for more and more ways to know something about you, while we also want people to be more and more anonymous online. And so finding that balance can be a little bit tricky. And that's fundamentally what the week was about. Let's look at the Internet, build some trust, talk about reliability and resilience, and see where is privacy going in the future. And one of the things that I've noticed first is the amount of interesting things that we put out this week in different topics, as you mentioned. Some really thinking about the future, others thinking about the present with AI agents. Even last week at the show that I did, we spoke about recent partnership with MasterCard and Visa, already announced as well, regarding AI agents and making that relation between agents doing things financially for us more secure, more trustworthy with the protocol. And in the background there, making also a difference. But you can see the spread out areas where the research team actually touches. Even with the partnerships and with friends, it's really spread out. And this was a very cool week, I feel that, in terms of research team of showing the different elements around privacy, trust, resilience, all of those things, measurements of surveillability. I'm actually having a conversation after we talk with Andrea Jesus from our radar team to show the TLDs page, new radar TLDs page. So it was really a very wide week in terms of topics, which is really interesting, to be honest. Thank you for saying, by the way, thank you for noticing. I think people here, we're really happy that we managed to put it together and get all the diversity of things out the door. Why not start with the by day perspectives in terms of the blog posts that came out? Sure. Monday was the first day of the week in a sense. It's not what we call Innovation Week, but it is a blog takeover week. Can you run us through a bit of the announcements on Monday? Right. So first, of course, Mari gives an overview of the week. I think one of the standouts, so I tried to focus on a bit more of an overview blog. So this is the tricky, and I promise, so the tricky science of Internet measurement. I don't know if you can, if you see that one and can bring it up. There it is. Yep. So this one, you might want to actually load it up and we'll kind of go through it. This comes back to that measuring tape analogy. And really, so this blog talks about what is involved in Internet measurement. How does it actually work? There's a really great example that kind of motivates this. If you can scroll down to this, there's this first radar graph, by the way, about traffic that's coming to us from Lviv in Ukraine. Partway down should be the first visual that you see. And this, the visualization that you're looking for, it was taken, it was about four days into the invasion by Russia into Ukraine. And what we noticed, nope, scroll up, keep going, keep going, keep going, keep going. You're looking for a graph. That's the one right there. Okay. So you'll see here that this is a representation of traffic coming to Cloudflare. And you'll see on the right, it just starts to skyrocket. Now, this is a time when everyone is highly alert, trying to understand what's happening on the ground, trying to understand and detect if infrastructure is affected. And this happened, we saw this at Cloudflare. And of course, eyes light up and say, is this an attack, for example. Now, we very quickly realized it wasn't an attack. Our DOS defenses didn't trigger. We had other signals as well. But we genuinely had no explanation. And I think this is the interesting thing is when we talk about measurement, sometimes you can see data, but that's it. And it's not possible to know anything more. And there's nothing that came to Cloudflare that would have explained what this was. The only reason we did finally come to understand it, the same evening actually, and we blabbed about it three or four days later, was that there was a mass migration of people going west. And they hit Lviv, I hope I'm pronouncing that correctly, which has the last train station before exiting the country in a westward direction. One of the employees saw this on BBC News on the nightly. And that's the only reason we were able to explain this. So this sort of motivates the problem is you've got a bunch of data that you're working with. Sometimes it's more than just the data that you're looking at that you need to ingest before you can start to use it to explain the world. Fundamentally, that's what this blog is about. So it goes through the measurement cycle of data curation, modeling around the data, and then validating the model. An interesting thing, I think validation is really important. It's a common mistake that people make is they oftentimes validate with data that is the same data sources as the data they use to ingest. So very early in the machine learning community, I write about this. They did this. They took a data set, and they'd split it up, and they'd use 70% for training. And then they'd get a model, and the model would predict correctly the remaining 30%. And hey, that's success. Except this is very much how you get to algorithmic bias. So if you train on a population of a particular ethnicity, for example, you're unable to detect, have success when you're applying it to different populations. Long documented. I highly encourage people to read this. This sort of sets the stage for a lot of what we talk about during the week. For sure, for sure. During that day, Monday, really, we had other perspectives. Of course, this is the intro blog. Exactly. But we have had other perspectives. Let's not go too much here, mostly because I have a conversation with Andrei about it. But there was also the data at Kolfleur scale, some insights on measurement. Right. So that was a former research intern. He worked with us on a connection tampering project. We published that work in 2023 and actually managed to get it up as a dashboard onto Radar, like a connection resets and timeouts dashboard. This is really interesting. I think really what he's writing about, I'll tell you, at the end of the internship, Ram came and he said, you know, I really want to thank you and the team for this experience. Hey, no problem. You did great. Knocked it out of the park. And his reply, I hope he doesn't mind that I'm actually sharing this. His reply was really interesting. He's like, oh, well, yes. Thank you for that, too. But that's not actually why I'm thanking you in this moment. What else could it be? And he said, well, you know, out in the world, we always have this notion, students, researchers, all kinds of people. We have this notion that if you were just that the large operators have all the answers. And if you could just spend a little bit of time inside and see all the data, then you would know all the answers, too. And he said, what's been amazing about this experience is if there's any truth, it's that it's amazing. Large operators can know anything at all because the scale of the data is so vast that one can never be sure that they're observing what they're actually observing. And so the extra care that's required to get there is, let's say, non-trivial to be kind. And that's effectively what Ram's post is about. He talks a little bit about that experience and how he came to that realization. I thought it was refreshing to hear. That's interesting. Sure. And also how interns can contribute as well. Absolutely. So this is crucial at Cloudflare. Interns at all levels across the company actually work on stuff that gets delivered. And certainly that's true on the research team as well. Sure. There's also the announcing workers automatically tracing now in beta, but that's not from this specific week. That was another announcement related to our developers platform, but that's often. We have so many things and blogs in any week, so that's also one of the blog posts from the week, but not from the takeover specifically. But also the framework for measuring Internet resilience. That's all about resilience, all about resilience that has been doing this work as well. Yeah, so this comes back to the multi -stakeholder model. So the Internet is an open system, but it's an open system of many closed networks. Now, increasingly, I don't think when people were thinking of the Internet, it certainly was this critical infrastructure in so much as one of the design goals was to be able to communicate despite failures. At the time, it was the 1950s. It was the Cold War. I don't think anyone anticipated that it would become also the critical infrastructure on which so much other critical infrastructure relies today. So increasingly around the world, water and transport and energy and so on in some form will be transmitting data using the Internet. So how do you trust this thing? The funny thing is, is the telemetry lives in every one of these closed networks. There are signals that you can get back from them, which are really important. And the Internet measurement community over a few decades now has been working really hard to get better and better at this. But there is this notion of how do we think about Internet resilience? Now, the funny thing is, we had a guest speaker in the London office a few weeks ago where there's talking about individual resilience. And the striking thing to me was that he pointed out, we think of resilience as human beings as being an individual trait. But increasingly, the evidence is clear that it's very much a community or a cultural attribute, that there need to be sort of processes involved. And resilience is a shared trait, not an individual one. And the Internet, it turns out, is very much the same. So it's natural to think of, if you're on a regional level, you can think of what can I control? So the telephone companies emerged and the energy companies and so on. They emerge in this space. But the Internet doesn't do that. And if regions around the world start to close it off, then you lose reliability and resilience. So it's very much the kind of system where if everybody does the things that they should be doing for resilience, then you build up resilience across the community and everybody does better. So that blog was very much talking about what does resilience mean on the Internet and how to think about it. But in addition, the starting points to evaluate it as well and the different layers that you think about. Because it's possible, for example, I could reach, I could think I have two different routes to the same destination. But it turns out they might use the same different cable underneath. When you're looking across the Internet, there are pieces that might signal resilience, but you have to go above or below in the network stack to understand if it truly is. And that's really what this blog is about. One of the most popular blogs of the week, at least on Hacker News, was this one, Keeping the Internet Fast and Secure, introducing Merkle Tree certificates. What is this all about? So I will say that I am a network and systems person. I'm not the applied cryptographer in any sense of the word. But I think what is fascinating for, certainly for me, and what motivates this, is this notion that when we build for performance, it's a mistake to think of security as an afterthought. Now we have the Internet, which exists, and we have the certificate authority ecosystem, which exists, and PQ comes along. And the problem is, is that post-quantum certificates are much larger than the certificates that we use or have been using up until today. And in the standard TLS connection, transmitting a post -quantum certificate in its entirety back to the user, to the client, is very much going to degrade the performance of the connection overall. So Merkle Tree certificates are in the, what motivates them is this idea is like, you need something that is equally trustworthy, but has minimal impact on performance of the system. And that's where Merkle Tree certificates are born. They're a really, really cool idea. I highly recommend it. It's no wonder that it was a popular one. And after the Merkle Tree certificates, why not go to post-quantum Internet? Something that we've been discussing for a while now. And we have like great numbers already in terms of adoption, right? Yes. So this one's actually really exciting. And it's one of those, we won't see the effects of it for some time. Well, look, it could be two years. It could be 20 years. I know people have a lot of feelings about this, but fundamentally, I think there are two things to understand about post-quantum. The first is the reason we've started encrypting right now for a post-quantum environment is because of store and decrypt later. Any data, any secrets that might be sensitive today, what we wouldn't want is for someone who can see the data on the pipe, on the path, for example, or get it by nefarious means, who knows, to hold on to it until a time when it can be decrypted in whenever post -quantum computers become viable. So the reason we encrypt in this fashion today is to prevent storing now and decrypting later. When we're tracking the adoption, it's not just one side. So Cloudflare now has full support for PQC communications, but it also requires that the end users that reach out to Cloudflare support it. And as far as I know, most modern browsers do. So as more of those browsers are being adopted, this will go up. But also on the servers, the origin servers of our customers, not all of them support it yet. And we're certainly helping them to get there. So that's one piece, store and decrypt later. The other piece, and we blogged about this a couple of weeks ago, which I think is a really important message, which is all of this can be done in software. Now, I know there's a growing industry around this, and there are all kinds of hardware boxes that offer some of the features that are required to do this successfully. But as the blog from a couple of weeks ago points out, everything that's required for post-quantum, right up to the random number generation, it can all be done today in software. And I think that's a really important message. Makes perfect sense. Well, it's quite interesting to see, especially that's already half of human-initiated traffic with Cloudflare being protected. This is half of all eyeballs, half of all clients reaching out to Cloudflare. Half of all connections are post-quantum encrypted. Hopefully, whatever in the remaining 50%, there's an operating system upgrade or a browser upgrade that is going to start to tick this upward. But certainly, as old devices disappear that aren't capable of running this software, and new ones appear, we're going to start to see this edge up over time. Regarding other topics of the week, there was an important one regarding defending QUIC from acknowledgement-based DDoS attacks as well. That was more on the protocol side, right? Yes. Yeah. So this is why we say research and friends, of course. This one's a lot of fun. So QUIC is a new transport protocol. Now, a lot of people might have heard HTTP. They see it in their browsers. HTTP is the application protocol, the application language that a web browser uses to communicate with a web server, for example. But in order to do that, the two endpoints, the two devices, first have to create a logical connection on the Internet, sort of like a telephone call. Historically, the protocol to do that is called TCP. But as we started to want to make new transport protocols, it was really, really hard to change the computers and the operating systems. So a number of providers and researchers initiated by Google way back when, we put together this new protocol called QUIC, which allows faster innovation. So QUIC is out there. It's 30% of HTTP connections use QUIC today, according to Radar, which is really spectacular. And that's set to grow as well. But it's still new, relatively speaking. So every now and then, we find some improvement that needs to be made, and this blog very much talks about that. Another of the week is so long, and thanks to all the fish, how to escape the Linux networking stack. Linux, actually, even today, I was discussing with someone from the team. A news publication in France was noticing, using Radar, actually, that Linux was increasing in adoption and usage in France, which was quite interesting to see. But Linux is quite important in this case. It's also part of the week. It's no secret that we use Linux on the servers at Cloudflare. It comes from a long tradition of Unix -inspired operating systems. This is one of the older operating systems out in the world today. And Chris, the author, he talks about how we use the networking stack in Linux. So this is the part of the operating system that does all of the communication things. How we've been able to make use of it and modify it to do not just web services, but things like private relay, the soft Anycast technology. All of that comes down to innovations that we've implemented into the Linux kernel. And so Chris talks a lot about that. Actually, this is, personally, this is one of my favorite blogs of the week because I love implementation things in the operating system. Makes sense. It's always interesting to see, even from the research side, you'll see different people having different favorite blogs depending on their experiences. Indeed. I mean, part of this is also when I enrolled in graduate school for my PhD, my very, very first project was an operating systems project. Having to work with the Linux kernel at the time, it was a nightmare. I much would have preferred one of the mature operating systems because Linux was still fairly brand new. But now, I mean, it's virtually everywhere. So it's a lot of fun to see this work. For sure. Let me share my screen again. Another one from the week is one IP address, many users detecting TG NAT. TG NAT. Yeah. Carrier grade network address translation. So I'm going to ask you to go into that one and scroll down to the first map that you see. While you do that, so what is a carrier grade NAT? First of all, to understand so that people watching can understand what a NAT is, it's very likely your home broadband router implements NAT, network address translation. So this is born out of a scarcity of IPv4 addresses. The devices in your home very likely have what are called private addresses or addresses from private ranges in IPv4. And these are addresses that should never, ever leak out to the Internet. And that's fine. Within your home environment, all the devices can talk to each other. But because there isn't, and we do this because, remember, there aren't enough IPv4 addresses in the world. Even though we thought 4 billion would be enough, turns out, you know, clearly we were kidding ourselves decades ago. So now your home broadband router, it has one public IP address. So one address that can be reached elsewhere in the world. Increasingly, it has a different kind of private address, but we'll come on to that in a moment. So the private address by your home broadband router, when traffic leaves and comes back, your home broadband router, it translates the private address into a public one on the way out. And when the public traffic comes back on the public one, it translates back into the private address. So it gets to the right machine. Okay. So it's sort of like you have one street address and a bunch of people living in the street address. It's a similar type of idea. Except, you know, you get to a stage where you think, actually, there's still too few IPv4 addresses. And so now you add what is effectively like we have enterprise firewalls. This is an enterprise scale network address translation carrier grade. Okay. So the telephone or the Internet carrier. And it's so it's the same principle, but on a much larger scale. And so you can have thousands of users in a neighborhood or in municipality, all represented behind a single address. In the mobile space, in fact, the users could be anywhere. So mobile companies, I remember the last time I was exposed to this, one of the largest mobile carriers in the United States had 11 of these running in the country for all of their subscribers in the whole country. It's a large country. In France, at the same time, one of their largest subscribers, I learned, had seven. Now, this is about 10 years old, so no one should run away thinking this is still true. But the point is, it's a very small number of interfaces to the Internet behind which are a huge number of legitimate users. Now, there are two things that are very important about this showing this map. The first is. Just because the way the Internet has evolved largely was launched in Europe and North America and has spread, there are more addresses in Europe and North America than there are the rest of the world. And that's what this map tries to show. So most when we think about Internet penetration maps, the number of users per country, Europe, North America, Australia, these places really light up. What we did instead was we normalized all of the IP addresses around the world by the number of users in the country. And what this shows is many parts of Africa, South Asia, for example, there are more users with fewer IPv4 addresses. And so suddenly these are the regions that pop up. The second thing that's really important is for all of these types of where I can represent myself behind different IP, VPNs, proxies and so on, even enterprise firewalls at work. Either I choose them or I know they're there. No one knows when they're behind a carrier grade net. Users do not elect to use them. They are just there. Now, when we think about serving users, all of these security features that exist in the world, oftentimes they use the IP address as some sort of a signal. How many requests are coming? How many user agents? How many whatevers? And that can trigger some responses. And if you look at the map, what we're saying is if we treat all of these signals the same everywhere in the world, then we might be doing a disservice to people in the global south. This work is about figuring out how do we detect where these addresses are so that we can do the right things by the users. And one of the really lovely things that's come out of this blog, usually ISPs are fairly silent about where their carrier grade nets are, what the addresses are. In response to this blog, we put out a call, hey, if you want to contribute, if you want to provide some data, learn about it, please reach out. And we've already had a couple of regional ISPs do that, and it's really, really exciting stuff. So I'm hoping eventually there's a coalition that builds, that convinces the large ISPs that it's worth actually sharing this information with other operators so that we can all do right by each other. What is the most consequential thing about that in terms of operators reaching out? Very simply, right? So very simply, you could imagine if you're sitting behind a carrier grade net, you might be more susceptible to things like CAPTCHAs or rate limits. Because the view is here's an IP address, and there's so much more, so many more requests coming to us. We don't know decidedly that it's malicious just because of volume, but clearly it looks different, so we're going to treat a little bit different. This is the sort of, in the worst term, in the worst cases, these IP addresses would be blocked by some operators or some features. I hope that that's never true. But certainly they are more susceptible to being rate limited or categorized in a way that isn't really justifiable, isn't justified. In terms of experience, things could be worse there because of that. It's about the user experience and doing the right thing by the customers as well, right? Customers don't want legitimate users affected. So it's really important on both sides of the equation. For sure. There's this other one, how to build your own VPN or the history of Warp. What can we say about this one? So Warp is a tunneling product provided by Cloudflare. It comes in two forms. There's the enterprise version used in the Zero Trust suite of products, but there's also a consumer Warp. So this is the free to download, free to use. By default, it encrypts DNS, but it can also be used to encrypt regular traffic. One of the nice things about using it, I turn it on sometimes. Remember the Global Anycast system, it creates an encrypted tunnel to the closest Cloudflare data center. And oftentimes that improves my performance because it's encrypted. The networks in between can't see it, can't judge it. But there's this question, how would you build one? And that's what this blog is about. Cloudflare is built by and large on open standards, open source where we can. And this is built right out of the Linux kernel, of course, with some candy around it. And this blog talks about how we do it. If you like implementations, you're interested in operating systems, how to do these sorts of things, it's really cool. Next, we have policy, privacy, and post -quantum anonymous credential for everyone from Lina and also Chris Patton. Yeah, and alongside it, there's another anonymous credential one for agents. Exactly. They're related. The post-quantum one is certainly important. So imagine you're traveling the world. So the example I like to use is you're a citizen of some country, you're traveling the world, and you need to access a government website for your home country. And your government website decides you are not in the country, and so you should not be able to access the website. Clearly, it's a little bit silly. And that's being kind. The problem is, if we want people to be safe, protect their data, feel some privacy, doing things like creating an account just to browse a website and get information is a little bit too much of an ask. So the research group, along with others, both in the company and with other organizations and other research environments, are driving a few different ways to do this collectively. One of those is this notion of anonymous credentials. An earlier version of this might be more recognizable in Privacy Pass. I know the authors of the blog might be upset that I say that. But this is fundamentally where anonymous credentials come from. So it involves getting these credentials that are unlinkable to each other, and I can redeem them and spend them. But they can prove something about me. So, for example, I would go to a trusted party on behalf of my government's website service, prove that I actually am a citizen of the country, and then when I want to visit that service wherever I am, I can redeem the credentials in order to get access, without revealing anything about myself, certainly without having to create a user account. And this blog is talking about how we're investigating how to do that in a post-quantum and mute it again. Yeah. There's two other more from the week, measuring characteristics of TCP connections at Internet scale. This one's specifically about how we've been studying connections almost as long as the Internet. That's a person, right? Yeah. TCP connections in particular, for longer than I've been around, I know this has been a fascinating topic of conversation, certainly in the research community and among practitioners. There's a notion of, you can't, our former CTO, John Graham Cumming, I remember, let's call it an off day, because I certainly laugh about this, and I think, I hope he laughs about it as well. He once said to me, Marwan, why do we, which by means he really met me and others, why do we care so much about measurement? And I was baffled by the question, because I think of John as being one of the smartest people I've ever known. And I said, but John, how do we improve systems if we don't understand them? And how do we understand them if we don't measure them? And to his credit, he, upon hearing the answer, he just laughed at himself. I think recognizing that the answer is entirely obvious. When we want to make improvements on the Internet, how do we do that if we don't know what the Internet is doing, what it's designed to do? So the connections, like I said, they transport data back and forth. And the whole purpose of this blog is to share the characteristics about the connections that Cloudflare sees, because this information is really, really hard to get out in the world. Anyone can do it on their own local environment. You can install some software and you can start to take measurements and, you know, see how many bytes are in every connection or how many packets or how long connections last and so on. But to do that on a global scale, there are very few places where that can happen. Cloudflare is one of them. And so we decided we would take some of what we see and share it. One of my favorites, this is a lovely one. Historically, we call these elephants and mice connections. So we know the vast majority of connections are very short, but there is a small number of connections that are very large. It's never been really clear what is the dividing line between. So I think if I remember right, again, I'm working off memory here. I think what was really lovely to see this for the first time is 90% of connections are fewer than 100 packets. Moving on, one that is bot related, but also we were discussing before WebBotBot, the protocol. It's beyond IP lists, a registry format for bots and agents that Ivo wrote. What can we say here about this one? Right. OK, so just to give some background. Historically, when we think about automated clients, bots that serve all kinds of purposes, when people try and understand what bots are doing, rate limit them, block them in some cases, the conventional means historically are IP addresses and user agents. Neither of those is reliable. In fact, anyone who's done this for long enough will know if they rely on only those two things, eventually you will cause an incident because somebody out in the world has changed an IP address or there's an actor that uses a different user agent. So they're very unreliable. WebBotAuth is about enabling clients to reliably identify themselves. So they have a public key and a private key. They sign the request in the header and then they send it off to the server and the server can see the signature and then validate that this request is coming from the client that only could have signed if the validation worked. You can imagine I need to keep track now as if I'm a server, you're Cloudflare or anybody else, you need to keep track of these keys. Do so manually, roughly if you're five or ten of these. But what happens when you get to a thousand or a hundred thousand or more? No manual process scales in that fashion. So this blog talks a little bit about how Cloudflare is moving forward to create these registry lists so that customers can have access to all of the agents that want to identify themselves. And we're trying to automate this and then work on it with other people. And it's one of those things that it's quite important to enable what potentially is coming in terms of agents and trust and making decisions that you can trust in terms of leaving agents to do work for you in a sense and buying stuff and moving stuff for you. That trust is really important. Is this really something that I should trust? And, you know, so coming back to the anonymous credentials, is this notion of we're doing web bot auth for the bot clients that want to reveal themselves, right? It's in their best interest to be transparent. But automations are different than people. And so this is why we create the anonymous credentials for people. There's one blog post, not from this week specifically, about the go and enhance your com, demolishing an HTTP to interop problem. This is very much the HTTP to a protocol specific blog. So interop is short form for interoperability, which means, let's say, two different vendors, two different implementations trying to talk to each other. This is how we reach consensus, how we build specifications for Internet technologies as people implement what we believe is the same thing. And the true test is when they can talk to each other successfully. HTTP2 is the dominant version of HTTP that's used out there. Quick is associated with HTTP3. And so, honestly, this one is news to me. I'm excited to read it as soon as we end this conversation, because interop is fascinating. Let me give you a case in point. When I moved to the UK, I think it took me a little while to realize it's very easy to underestimate differences despite the shared language. So there are some words in the vocabulary, within the English vocabulary. If you go to different English-speaking countries around the world, they use the word differently. So interop, in that sense, has failed. And so this is why it's so important. But this is regarding the Internet specifically. This is regarding the Internet specifically. If there's an H2 interop issue that needs to be repaired, then what that means is there are two HTTP2 endpoints, let's say a client and a server, that are trying to talk to each other and something goes awry somewhere. Because each of them has implemented a feature slightly differently. One of them has made assumptions about a feature that the other one has not. This is why as soon as we hang up, I'm going to go read. Just to wrap things up, what is the main takeaway you would take from this week? Or you would like for the audience to take from this week, really? Ask a harder question, why don't you? On a personal level, I cannot stress enough the importance of measurement and transparency. How we build those into our systems. In transparency here, I'm using a large bucket. So the anonymous credentials, WebBot Auth is a form of transparency, right? Where we want to keep people safe. But clearly there needs to be some information in order to be able to transact. You might recognize my voice when you can't see me, for example. How we do this at scale is always the challenge. And I think maybe that's a big part of what this week is about. Is things that we feel are important, but at scale take on a whole new life. And that really at its heart is what this week is about. How we measure things at scale that are so incredibly noisy. What resilience looks like and reliability. And how we build transparent and safety into the system for users of all kinds. This was great, Marwan. I have a lot to read. I've read a few, not all. But there's much more things to explore and to learn. It's hard to keep up, yeah. Thanks, Gerard, for having me. Thanks for doing this. And that's a wrap. Thanks. And that's a wrap.

This Week in NET

Tune in for weekly updates on the latest news at Cloudflare and across the Internet. Check back regularly for updates. Also available as an audio podcast!

Watch more episodes