Cloudflare TV

This Week In Net

Presented by John Graham-Cumming, Martin Levy
Originally aired on 

A weekly review of stories affecting the Internet, brought to you by Cloudflare's CTO. We'll look at outages, trends, and new technologies — with special guests to help us explore these topics in greater depth.

English
News
Interviews

Transcript (Beta)

Good afternoon from Lisbon, good morning from California and I'm John Graham -Cumming and this is a silly show we're doing called This Week in Net and the idea is to talk about something that's happened on the Internet this week or maybe many things and last week I did this all by myself and I got lonely so this week I've got a buddy with me.

Martin do you want to introduce yourself? Yes, hi, I'm Martin Levy, I'm a distinguished engineer at Cloudflare, been with the company six years and joining John this morning to actually on my debut on Cloudflare TV, so first time on.

Well I hope it's painless, we've got 30 minutes and what made me want to get you on the show today was that we saw Hurricane Electric announcing that they were filtering RPKI invalids, is that right?

That is correct. Okay, what the hell does that mean?

And who are Hurricane Electric and why is that important? So many questions this early in the morning for me.

You got your tea though, right? Yeah, no, no, in great respect I have your tea so I will keep going quite well.

Alright, so the Internet is a collection of 800,000 individual IP routes and 60-70 ,000 individual autonomous systems.

Each autonomous system is a number and it's allocated to a particular network.

Cloudflare is allocated the number 1 3 3 3 5 and the interconnection of one network to another is what makes up the Internet.

You may be watching from a cable TV connection or a mobile connection somewhere anywhere in the world, that provider has an autonomous system number.

Now the networks connect not in a random manner but very much in an organized manner where Internet peering, the connection between one AS and another occurs at local peering points and then there are a set, a large set of international backbones that transmit bits around the globe.

And those backbones, many named well-known companies are responsible for the majority of the connectivity that the Internet operates on.

One of those companies, Hurricane Electric with an AS number of 6 9 3 9 announced this week that they had implemented filtering using RPKI and this is an important step for the Internet in general.

So let's take a step back and decode the RPKI acronym. Internet routing needs a second database, not just the database of where and how to route from one network to another but it needs a database of who is allowed to route from one place to another.

An RPKI is a protocol defined by the IETF which defines nearly all protocols that we use on the Internet in order to provide a path for a route to be accepted by another network.

It does this by signing using an ascertation, in fact an x509 cryptographically signed object that has at its root the ownership of the RIRs.

These are the routing Internet registries, the regional Internet registries, one for North America, one for Latin America, one for Europe and beyond, one for Africa and one for Asia-Pacific.

The ownership of an IP or at least the allocation of an IP is controlled by those five entities and so therefore those five entities run a trust anchor, an x509 certificate that says we are responsible for the following set of IP addresses and subsequently or downstream from that in a tree is another certificate and another certificate as a hierarchy of essentially who owns what IP.

Within that tree there is again assigned using x509 which is a well-trusted method of keeping certificates.

There is an entity called a ROA, an R-O-A.

It's an ascertation of the route origination. So a particular route that exists on a network, let's take Cloudflare with its AS of 13335.

We announce an IP address 1.1.1.0 slash 24. That's a block of IP addresses and a ROA exists in that hierarchy of signed information that says 1335 Cloudflare is in fact legitimately allowed to announce this particular IP address.

At the moment there's about a hundred thousand plus ROAs that exist and growing and growing quite healthily and of that it means that should somebody else announce a route that is covered by a ROA but they announce it incorrectly or they announce it by mistake or they announce it maybe trying to to hijack then that route for a network that is implementing RPKI will be dropped as an invalid.

So which then brings us on to the isbgpsafe.com website.

Let's bring that up. I think I have that here somewhere.

Yes and the fact that Hurricane Electric in the last week, a very large network, probably the network with the largest number of BGP peers and interconnections of anybody on the web.

There it is. There we are. So June 16th they did an announcement, in fact quite a detailed email to the NANOG mailing list.

A fairly easy list to search for.

N-A-N-O -G-O. Actually did we link it? Yes we linked it off of that website.

Click on it. There's a link here to the source for that.

So I think we can, they're announcing it and then there's a conversation there about it.

There's a long conversation about it, yes. So that means that should a route be announced that is invalid, in other words either by mistake or for a nefarious reason, a large network such as Hurricane will now drop it at its edge whether the route comes from a peer or a customer.

That's a very important point.

Let me interrupt you. So why does this all matter? This all sounds great but why are we trying to eliminate this problem of people saying, hey I'm 1.1 .1.0 slash 24?

There's some horrid truths to the growth of the Internet and BGP as a protocol circa the mid 90s, early 90s, 94-ish approximately, really didn't take security into account and predominantly because well, we all knew each other back then.

And we, well, we didn't trust each other, but we at least knew each other.

And if a mistake was made, we could normally go tell somebody and fix it.

And pretty soon, of course, filtering existed.

And then pretty soon after that, routing registries existed in order to keep a listing of IP addresses that are owned by one network and another.

But the network, the Internet is not what it was in the nineties. In fact, not even close.

It has astronomically grown and it has got players where you just don't have, you can't text or message somebody that, you know, you don't know everybody.

So filtering has turned out to be important. Also, the Internet has grown to the point where, to be honest, fat fingering will occur.

Statistically, it will occur a lot more. The Internet is bigger. But as you and I know, and we have written about in our blog, route leaks and more importantly, route hijacking has existed and has been either locally or globally quite successful.

And this has to stop. So RPKI as a protocol came out of the IETF, was well thought out, has grown specifically to solve one problem, route origination.

Not the path, that's a different conversation, but just route origination.

And so if you scroll, as you've got that up on the screen, scroll down and we'll look at the greens and the oranges.

But what we have here is we have networks that have taken route security to the fullest.

In green, we have networks that we have measured and have clearly stated that they have both signed their own routes.

In other words, they have made their own announcements secure within networks that implement RPKI and also are filtering others.

And that list is growing.

The ones that are partially safe are the ones that are partway into the process.

I won't get away from the fact that this is a hard process. It does take time.

It's much easier now. In fact, it is phenomenally easy now compared to where it was many years ago.

And then there's the red ones who are unsafe. But let's talk about what we mean by that.

In this context, what does unsafe mean? If I sign a route and Cloudflare signs all of its routes, we want to make sure that a network is capable of dropping somebody else if they mistakenly or nefariously announce that route.

The ones that are marked as unsafe won't do that. Won't do that as of today.

That list, by the way, has been moving into the orange and into the green, absolutely, since we opened up this website.

And the unsafe part simply means that they are not filtering.

And not filtering means that there is the possibility that a route hijack will affect their backbone.

It's not guaranteed, by the way, because there are many, many aspects of this.

We have a nice animation.

If I press this button, hijack the request? Yes. Right. Boom. An attacker could divert the traffic that was meant to go to, I mean, Google.

It happened with YouTube, right?

Very famously, Pakistan tried to say it was all of YouTube and all the YouTube traffic went there.

And so without this filtering, without looking at these ROAs and these RPKI announcements, the traffic can actually literally get diverted to somewhere else.

And actually, not very long ago, there was quite a famous attempt to do this, to steal cryptocurrency by actually directing stuff to servers that people controlled.

Yes. And that brought the particular web cloud provider into the world of RPKI literally overnight, because they saw that as an issue.

We have that often on the Internet where an event occurs and it just gets people into the new technology quicker than they expected.

But let's go back and talk about RPKI in general.

The idea is to protect the route origin, not the path. There are many, many aspects to securing the Internet.

This is just one of them. And there has been many other areas which have been quite successful in securing in one form or another, the Internet.

But for RPKI, we're looking at route origin protection specifically.

We're using it using an assertion that is X509 signed that can be traced all the way back to the allocation or the RIR, the owners.

These are the people that keep the phone book of who owns what IP. This is the definitive list.

So the important part about RPKI is we're solving one particular problem and we're solving it with a cryptographically and assertion based process.

And so far, it's been quite successful and has brought about a lot more safety on the Internet.

We have a long way to go. We always do. It just turns out with everything that we've worked on, whether it be a TLS work with secure websites or this or time, we're now bringing time into the security.

There are many, many areas where security is being brought in and this is the work in the route space.

And Cloudflare has done not only this website, but also we've done some open source software.

Our Octo RPKI release, which is on our GitHub pages is an RPKI validator.

This is the ability to query the five RIRs and to bring down the whole cryptographic tree and build the database of what is considered valid and invalid.

And then the other piece of code we released, which was GoRTR is the piece of code that brings that list, that validated list and links it directly into the Internet routers, which are the core of the Internet.

That's where an Internet route exists.

And by the way, it's actually rather important to understand this is all near on invisible to the end user.

The end user will go to a popular social media site or to their bank or to a blog or to a pictures of cats.

It doesn't matter where they go.

Every single request they're doing converts to an IP address. The IP address converts to an IP block, which is then routed by this underlying infrastructure, sometimes locally to a content delivery network like us, and sometimes remotely over undersea cables to somewhere far, far away.

All of those bits, when they fly on the wire, are directed by the protocol BGP and therefore hopefully protected by RPKI.

So this isn't something that generates a green bar on your web browser.

This is something that is the responsibility of the Internet service provider or the mobile provider or the fiber to the home provider that is giving you service.

What do you have on the screen there? That's the, maybe I should increase the font size.

That's the Octo RPKI. This is our RPKI tool set. There are multiple sets out there.

You're a software developer. You, I know, appreciate this.

Octo RPKI is written in Go, as is Go RTR. We have other software written in Rust.

We have software written in C++ and we have other software written even in Java in order to do the same interpretation of the RPKI protocols and data sets and therefore bring the data back into an ISP for filtering.

This diversity of software has actually truly helped the protocol move forward because diverse groups writing identical software, identical, sorry, software that's on an identical protocol definitely means that the protocol itself gets interpreted independently and then verified, hopefully verified.

And a particular bug that may exist in a particular nuance of a particular language will get flushed out because it may do one thing in Go and a different thing in Rust or you interpret it one way in Go and another way in Rust and ultimately you prove the protocol to be correct.

Yep, yep. And so, you know, people actually use the Cloudflare tool set to actually implement RPKI?

Yes, absolutely. So we wrote the tool set in a different methodology than other people, which by the way is also in the software world a good idea.

So if you go look at Go RTR, there's I think a page on the GitHub page that shows who uses it.

Go RTR is a very lightweight piece of code that converts the validated database after you finished all the crypto work.

In other words, you have looked at the X.509 certificates, you have decided who owns a particular route and whether it's valid and invalid.

And then it takes that, again there's a very lightweight protocol, strips the cryptography out of it and builds a routing database that is shipped very lightweight into those routers.

So those routers are the things with 10 gig and 100 gig and now 400 gig interfaces that are moving bits at ridiculously fast speeds.

They need to make a decision as to whether to forward a packet in nanoseconds.

So doing a cryptographic calculation is just not practical.

So Go RTR, so there's the list of players. Go RTR takes the post cryptographic data, generates the RTR protocol, which again comes from the IETF.

It's an RFC8 something, something, something. And it talks specifically to the routers.

And the interesting thing is you only need to run one or two validators.

Maybe you run one in North America, one in Europe, one in Asia, you run it for diversity reasons, but you can run hundreds of these lightweight Go RTR processes.

We run ours, by the way, in a Kubernetes environment. We run a front end, the Cloudflare load balancer in front of that.

All the things that one would do in a good enterprise network at scale type setup.

And the Go RTR process, which is used by various carriers, is moving those 100,000 routing pieces of information, and more than 100 ,000 now, in and out of the routers in a very efficient manner.

Ultimately, you're dropping... So this was the carrot to get RPKI, was here's some free software that will actually make it work.

And by the way, Cloudflare is doing it.

And then is bgpsafe.com is kind of the stick? It is. It is.

And at some point in time, putting up a wall of shame is simply the only way to go.

And while we've done that, in this particular case, what I would tell you is that it has also...

The website has got the ability to test. So you can actually click on a test right there and then.

And I'll tell you how we do that test, because this becomes a very interesting situation.

In order to test something, I don't know what ISP you are, it'll be interesting to see what yours is.

I'm not going to shame my ISP publicly, because I can tell you they'll be shamed.

Okay.

Let me tell you how we do this test. We announce hundreds of routes with inside Cloudflare for all our web properties, for our DNS, for warp, for access, for all of the services that we run on our edge in 200 places around the world.

And then we pick a v4 address, a v4 block, and a v6 block, and we mark them as invalid.

So even we aren't allowed to announce them in the RPKI world.

And yet they're accepted. And they're accepted by people that either...

Well, they aren't running RPKI. And so when we test, what we're doing is in the background in JavaScript on that website, we are doing a query to Cloudflare for a known good address, which always works.

If you have this website up and running on your screen, then you know that you can talk to anything.

But we also request an invalid. So we have a URL. It's called invalid.rpki.Cloudflare.com.

It's a very simple website that responds. If we get no response, well, then it's blocked, because no one should accept that route.

And in fact, what we do is we measure whether that route shows up in particular backbones.

But what we also give is the ability for an individual, an end user, to test their ISP using that link.

And we do that. And so this... And they can tweet it as well, right?

They can tweet out that they... Oh, well, that's the whole... That's the wall of shame part of this, and the multiplying factor.

And unashamedly, I would say that that is a very powerful tool.

We make sure people understand what they're doing, but also we give them the ability to say, my ISP, we won't name any, is in fact actually filtering, congratulations, or not filtering, shame on you.

It's been pretty successful, to be honest. You said about 100,000 out of 800 ,000.

So about 12 something percent of the routes are signed at this point. How significant is it that large networks like this are actually implementing it?

Oh, it's beyond significant.

Let me deal with two aspects of that number. First of all, not every route on the Internet is equal.

I run a highly insignificant home website, which maybe gets three hits a day, if I'm lucky.

And it sits on an IP block, and that IP block is not very important.

But an IP block for a very popular social media site, or a very high bandwidth content delivery network, or a government website, these are important sites.

So the key here is to have the most important, maybe by access, websites covered by RPKI.

And then you have this sort of trailing list.

So we look at the social media sites, at the content sites, and say, this is very important.

But there is one other aspect of this. Signing a route protects yourself.

Implementing RPKI and dropping routes from others protects others, and protects your access.

It makes your ISP, and therefore your users, have a safer experience.

But the Internet has in itself a very slight hierarchy. The hierarchy is that these tier one ISPs, these sort of top ISPs, of which there is a baker's dozen plus worth of, they need to implement RPKI.

Because the moment that those core networks implement this level of filtering, it stops propagation that may occur.

So in green, we see a couple of those tier ones. And the tier ones, you can go look at the Wikipedia page for who believes they are a tier one.

Some are clearly tier one, some question it, but we see Telia, Cogent, GTT, NTT, Hurricane, etc.

And then we start seeing in the yellow, Tata. But in the red, we see level three, Sparkle, which is Telecom Italia, Zeo, Vodafone, etc.

These are networks which are clearly very important networks, yet to implement filtering, or even protecting their own routes.

If they say started, then we know that they've started.

This is a very nice, somewhat open community in the Internet. We do help each other heavily.

And so we know that one backbone is actually in their interest to help another backbone in this area of security.

The moment that all of those tier ones are dropping routes, because they're invalid, the Internet will suddenly clear up.

And what will happen is, we will start dealing with highly localized major ISPs.

So let's say you have an ISP that is a European-wide ISP, or Latin America-wide ISP, or a Japan-wide ISP, they are phenomenally big in their own right, but they're not part of the global tier one set.

When we see those filter, then we start seeing route leaks, such as the ones that we've seen, whether they be from Pakistan, or Malaysia, or we saw one in Nigeria, about a year or so ago.

Those will become highly localized, and will not propagate around the globe. So last week on the show, you weren't here, but I was talking about Starlink.

And Starlink has started sending us some traffic, and actually, they connect to Zayo.

So maybe there's a risk here that Starlink can get hijacked, given that Zayo doesn't actually do any RPKI at all.

Any network that's marked as unsafe has this issue.

And those networks have some significantly important customers. And that has to be addressed.

They have to bring RPKI into their world. But I will make one comment, slightly in their defense, only slightly.

These are big networks.

These are big networks with a ridiculously large deployed base. So I'll give them a little bit of leeway, because it's hard.

Except for when we look at the list of big networks, and we see the NTTs of the world, and the AT&Ts of the world.

There's an old phrase, you know, if AT&T can do it, then well, so can everybody else.

So can everybody else. And recently, we've seen Cogent come in here, and they're one of the recent wins, right?

A couple of weeks ago. Yes, Cogent and Hurricane, these are both highly well -connected networks.

And therefore, by filtering that many customers or peers, they are improving the Internet dramatically.

And my hat's off to both of them. All right. Well, I think on that hat's off note, we're done with RPKI, and we're done with NTT.

And tea, the tea is cold now?

The tea is cold, but I have a teapot. This is real tea. There you go. It's real tea.

It's not fake morning TV tea. That's good. All right. You've got tea.

Well, listen, Martin, thanks for coming on the show. Thank you for telling us about RPKI.

Brilliant stuff. Cheers. All right. Take care. Enjoy the rest of your day.

Thanks. Bye. Now we sit drinking tea until we can go. Now we sit drinking tea until they cut us off, basically.

Yeah, yeah, yeah, yeah. Exactly. Except I don't have any tea.

So unfortunately, so we'll be here. But thanks very much for that.

And hopefully, you'll come on the show again at some point and tell us that all of the major tier ones are now doing RPKI, and everything's fantastic.

That will be the day.

You'll see me celebrate that day. It won't be tea. I'll be taking off a V6 t-shirt.

This V4 t-shirt has got to go. It won't be tea you're drinking that day.

No, it won't be. I will celebrate that day. All right. All right. Brilliant.

Thank you very much, Martin. Take care. Bye-bye. Have a good day. Bye.