🔬 Cloudflare and the IETF

Presented by: Chris Wood, Eric Rescorla

Originally aired on March 27, 2024 @ 3:30 AM - 4:00 AM EDT

Join our research team as they discuss Cloudflare and the IETF.

Read the blog post:

Cloudflare and the IETF

English

Research

Transcript (Beta)

Hey folks, my name is Chris. I'm a research lead on the Cloudflare team and I'm joined here by Eric Rescorla, Firefox CTO and IETF extraordinaire. We're here to talk about some, I guess, new, interesting privacy and security stuff that's happening in the IETF and what companies like Cloudflare and Mozilla are doing to further advance these technologies. So Eric, yeah, thanks for helping on the call. Great to be here. Yeah. So I think to level set everyone, it's probably, before we talk about the particular protocols in question, it's probably useful to at least make sure that our end goals are clear to everyone. And Eric, I don't want to speak for you, but from my perspective, a lot of the work that I do personally in the IETF in various capacities is primarily to make the web and Internet more private and secure. And that comes in a variety of shapes and forms, adding more encryption everywhere, making things more, quote, more private for some definition of more private. And I'm curious if you have sort of anything else to, I guess, say on that particular front. No, I mean, I think, well, yes, I guess. I mean, so both personally and like as Mozilla, Mozilla is like what we call mission driven organization. We have this manifesto that has a bunch of points. And one of the points is privacy and security Internet are fundamental. So I think we're really about trying to do something about that. And I think as people probably know, when the Internet was first developed, this was not a high priority. I don't mean to sort of cache in the people doing the work. The technology really wasn't there. But nevertheless, the Internet as Internet 1.0 really had quite poor privacy and security properties. And we spent the past, I don't know, 25 years trying to backfill in those security properties in an acceptable way. And so it's a long project. But that's what we're trying to do. Yeah, totally. So I guess on the backfilling front, what do you think are the most important challenges that the IETF is working on right now with respect to security and privacy? Yeah, I mean, so I think, I mean, probably useful to be historical about this. Initially, all the data was just there. So there's like no meaningful security or privacy against anybody on the Internet. And if anybody could access your link, it was basically a game over. And so it's been a very long process to get to the point where even the majority of the traffic is encrypted. And we're there for the web. I think we're there for messaging. We're not there for some other things necessarily. But we're getting there. So I think that was the first. That's a basic substrate which you need to have. And frankly, which is almost necessarily required for any other kind of security or privacy. But that doesn't actually do as great a job of privacy as you'd like. And I think there are a number of ways it manifests. It's probably pretty widely known at this point that the web is sort of this nightmare of tracking. But also, it's the case that even when you have encrypted traffic, people on the network with you are able to determine information about what you're doing. And so that may be what websites you're going to. It may be what images you're retrieving on the websites or which sections of the website you're going to. And so that used to be, of course, very easy when the data was encrypted. But the technical problems around protecting that information are harder than technical problems around just encrypting it. And so as I alluded to earlier, when the thing was first designed, people felt like they did what they could. And so then we spent the past 20, 25 years doing what we could. And a lot of that was just getting the substrate for ubiquitous encryption. But now that that's starting to happen, what we need to do is figure out how to take the next steps and protect against the kinds of metadata surveillance that we still see. And of course, against the kinds of web tracking that we still see. Yeah, totally. And metadata, I guess, safety on the web. It's like one of the things that at least my team in Cloudflare and as well as many others are working to plug. So metadata leakage in protocols like TLS, you have the plain text contents of the client hello sent in the clear of your DNS queries, who's sending a DNS query, even though it may be encrypted, are all critically important, I think, to prevent this tracking nightmare that you speak of. So there, you and I collaborate on a number of things in protocols related to this particular metadata leakage problem. I think one of the most important ones right now is this TLS encrypted client hello, which as the name suggests, basically is a protocol to encrypt the client hello. And I guess it, I don't know about you, but in my experience, it comes across as somewhat surprising to people when they learn that there's still so much metadata leaked in TLS 1.3 in the client hello, especially around the SNI. So I'm curious, given how involved, deeply involved you were with the standardization process of TLS 1.3, how did we get to this sort of state where we need ECH to begin with? And was there something we could have done maybe differently back when 1.3 was being standardized to address this? Yeah, I mean, so this obviously came up quite early in a number of the, there are a number of very early discussions before 1.3 was really like on track about solving this problem. Daniel Kahn Gilmore from ACLU, I'm trying to think of other people, as well as names escape me, suggested a bunch of options for doing this. And the bottom line was they didn't really work very well. And so I think, and I think in particular, we were really struggling with exactly what information the client had to have to make this work. So, I mean, I think this is going to get a little technical fast, but there are two kinds of attack models you can imagine having. One is an attack model where the attacker is sort of passive and just watching on the wire. And one is with an interference connection. And so there are a bunch of designs that would allow us to encrypt this. And I, even if the attacker was interfering with the connection, but they didn't provide attack, sorry, sorry, the attacker, I'll take a step back. So there were designs that would allow you to have passive protection, but would not secure against active attack. And those typically required extra run trips. So people didn't like that very much. And then there were sort of like some thought there might be designs that would allow you to have protection against active attack and wouldn't require a run trip, but they required sort of a set of, a way to publish basically the key material you need. And no one really solved that problem. We had ideas, but we weren't quite how to make it work. And so I think, so we sort of reluctantly made the decision to throw this overboard and decided to make incremental progress. And with an incremental progress consisted up was like encrypting basically everything after that point. And that's like critically important because if you don't encrypt, for instance, a certificate, then it's no point encrypting the SNI, but it doesn't get you all the way there. And so, but that said, it took, you know, two CLS 1.3, you know, on the order of four to five years to finish. And so I think, I think we made the right choice in the sense that, you know, I worry we never would have finished and that would have been much worse. Cause I think it's been a big advance and it set the stage for us to do this work now. And I think, you know, you know, I don't know how much we want with history, but history, this is sort of confusing and it, we sort of like kind of given up and said we didn't know how to do it. And, and in particular, we were like very worried about, you know, as I said about how to, how to sort of seed the key material and how to set it up. And I got an email one, one day from Matthew Prince, CEO of Cloudflare. And he says, well, why don't you just like, why don't we just give you a list of like every IP address that Cloudflare has and you'll have a public key and you'll ship this to us. And I was like, that's a really interesting idea, but I'd like this all more globally. But what it would, but the but the thing we realized collectively was if you said, look, we're just going to solve the problem for the subset of entities, which control their DNS and which have a bunch of a bunch of clients kind of co-located like CDNs, then we have possibility to solve this problem because they can have one key established for everybody they serve. And then we can, and then we can cut off a big part of the problem without solving the whole problem. And so that's what the current approach really involves. And I think, you know, I just want to circle back to this notion of we're trying to make incremental progress. And so, you know, you know, when Tails initially done, they did what they could. When we did 1.3, we did what we could, and now we're doing what we can. And, you know, this isn't a complete solution because there's still times when, for instance, people aren't co -located, in which case concealing where you're going doesn't help very much. But then maybe we'll do what we can later with that. So we're trying to do what we can. Yeah, totally. Incremental progress is the name of the game with these problems. So much so that I think it's kind of influenced the design of ECH in many respects. Like we have for a while now been going back and forth with respect to, you know, what is the ideal threat model that we want the protocol to target? Is it this fully active interfering adversaries trying to figure out not only are you capable of ECH, but did you use ECH? And then there's, potentially weaker models where we're okay if you use ECH, but we don't necessarily care about what the underlying plaintext was. In the interest of sort of making forward progress, we had to compromise and realize that we're not going to protect against all active attacks in the world. There's some information that's going to be leaked to the protocol, but that's a fine compromise in the name of trying to make forward incremental progress. So in many ways, trying to figure out what this trade-off was has been somewhat of a blocker for progress in ECH. Because I think it seems to me that in contrast to, I guess, prior extensions, this really forced us to really evaluate what the attacker's capable of and what the ideal privacy goals are. Whereas like with Vanilla TLS, it's, do you have a secure traffic or main secret that you can use to encrypt your traffic? Traffic analysis and everything out of scope. But now we're asking these questions, what role does traffic analysis play in this particular protocol? What additional things do servers and clients have to do to make sure that they're not inadvertently leaking information about this plaintext information? And the process of developing ECH has been pretty informative in terms of what these different attack vectors are. But I think we sort of converged on something that's pretty deployable. In fact, earlier this week, we announced that we've started our initial deployment of ECH. So once the browsers are ready, they can start talking to us and we can see how this thing fares in the wild and whether or not these models that we have in our mind match with what we've designed for. And yeah, it's going to be interesting to see sort of where ECH goes in the next couple of months, in the next couple of years, and sort of what the future of TLS privacy looks like beyond that. I was asked recently by a colleague, if you assumed a world where ECH was everywhere deployed universally, what remains of TLS from a security and privacy perspective? Or can we sort of declare victory and move on to higher layers of the stack? Yeah. Well, I think there's still some stuff to do. And I think what you're alluding to, first of all, I think reveals the immaturity of the technology we're working with. I mean, they've had a lot of experience on insecure protocols. And even so, it's very hard to get right. And so this is the first time someone's tried to do this. And so it's a much harder problem. And so it does require grappling with these questions of what's acceptable as opposed to what's ideal. I think probably the aperture is getting opened about what TLS means. But the two major privacy leaks that will remain once we sort of, if we had universal ECH, would be, first of all, people who are not co -resident on sites. So it doesn't really know. It's like if you go to lots of people who are on Cloudflare.com. And so if I go to Cloudflare, it's not clear which site I'm going to. But look, there's only ones that are on Facebook.com. It's Facebook. And maybe it's Instagram. And so it doesn't really help to conceal the fact whether it's Facebook or Instagram, you're going to. And so that's a problem that is probably not solvable with the TLS layer and probably involves technologies more like Apple's private relay or VPNs or stuff like that. But then I think at the TLS layer, we already know that it's possible to look at the packet sizes and delivery times and stuff like that and use that to infer what site people are going to. And I think there was some hope this would get better with QUIC. But it doesn't seem to have. It seems like it may actually be worse. And I think that's still unknown whether that's actually true. But I think there are hopes it would be potentially better. But so I think there's been some work on trying to conceal that information and do padding and stuff like that. And I know you have a draft describing some of that work. So I think that's probably the last big thing we really know about as a work item is trying to address that problem in a serious way. It's not really like a TLS problem. But it's like a problem you have to solve to provide the service you want to provide all the way to the stack. Yeah, totally. It's another incremental improvement that we can make after we've plugged all the obvious leaks. You mentioned something quite interesting. And that is the metadata that's leaked by the client IP address, for example, and our need to sort of address that when you don't have servers that host more than one particular website. And so that sort of, I guess, leads nicely into sort of another line of work that's happening in the ITF right now around these oblivious protocols. There's oblivious DOE or oblivious HTTP or OHI. I don't know how you want to call it. I think we should call it OHI now, right? Yeah. So I mean, you've been following that work for quite a while. Can you sort of summarize for folks what the main purpose of something like ODO or OHI is compared to, say, like a Connect proxy? Yeah, yeah. Let's situate this in context, right? So I think there's a long, long history of attempting to address these IP. So first of all, there are really two major threats you have to be concerned about. The first is people on your network seeing where you're going. So that's a real concern. That, to some extent, as you say, is solved in some cases by ECH, but not all. And then the other concern is people on the other side seeing where you are. And so there's a bunch of long set of technologies around this, I think, ranging from, I would say, a fairly weak threat model, which is what VPNs really address, to a very strong threat model, which is what Tor really is designed to address. But all sort of the same basic idea, which is you would proxy the data as a stream, effectively, from one side to the other. And so from the client's perspective, it looks like it's making an end-to-end connection over a tunnel, but it's still end-to-end connecting to the server. And that's really, really very good for numbers. This is what Apple Private Relay does as well. And also Firefox FPN, things like this. So that's very good for a number of settings, and it's very good for basic web browsing. But what it's less good for is settings where what you have is a relatively small number of queries that you want uncorrelated. And so a good example of this is if you go to Facebook, and you make a lot of HTTP requests, but there's a cookie. And so those requests are all coupled together, and so there's no point in trying to conceal that they're going together. But say you're connecting to a DNS resolver, and you don't want your DNS resolver knowing the pattern of queries you're making. And in that case, there probably wouldn't be a cookie, and you want to break them apart. And so having those all in the same connection is actually very, very inconvenient. And so the work on this, I think, really started, well, there was a paper called Oblivious DNS by Feimster et al, describing how that's kind of a tunnel of stuff over DNS packets. But then there was this, the first real work on this that I think got a lot of traction was called Oblivious Doe. It was done by, I think, Cloudflare, Apple, Fastly. I know you were involved. That basically said, OK, instead of forming a connection between the client and the server and having it proxied, what we're going to do, so we'll still have a proxy, but the proxy will be basically switching individual encrypted messages as opposed to switching connections. And so you don't have all the connections that are overhead. And so the point being that each DNS query gets separately processed, and it's not really possible for the server to figure out which queries correspond to which people, because they're all intermixed in the same connection proxy. And so that got a lot of interest. I know people are doing it now. And then I think collectively, we all started realizing that that sort of mode of operation was more generalizable than just solving DNS problems and could be used for things like, in particular, browser telemetry, statistics submissions, maybe safe browsing queries, anywhere you had this setting where you had a bunch of things happening and you don't have them coupled together, and they were kind of like disconnected request-response kind of activities. And so the ITF has just now chartered a working group, which, as you alluded to, is called OHAI. There was a bunch of back and forth about the name that is basically attempting to standardize this kind of message-based proxy technology. And I guess I do want to like, there's been a lot of confusion with this, so I do think it's helpful to clarify. This is not designed for general web browsing use. It's actually not really usable for general web browsing use for performance reasons, and also because there's really no point, because web browsing use the connections. You don't try to disconnect them because you have a cookie or something like that. It's really, really designed for these kind of like non-web client server applications, where you have like a single thing you want to do, and then you don't want to keep any state between multiple requests. Yeah, I've been describing it as sort of appropriate for transactional protocols, like DNS or like analytic submissions, like you said. Yeah, that's a great one. Yeah, and I think that summarizes it up quite well. And so the OHAI meeting, I think the first OHAI meeting will take place at the upcoming ITF meeting in M112. So folks who are interested in that should check it out, and I guess learn more about what's coming down the pipe for that particular protocol. So you mentioned one of the use cases for OHAI was this analytic submission task, which many clients do. They have some metric they want to report up to some central server, so some collector. So this collector can do a data analysis to try to see what are popular domains that people are visiting, maybe what websites cause users to crash or what have you. There's all sorts of like various scenarios where this would be useful. But OHAI and OHDP sort of in general, these protocols where they just proxy individual metrics between client and some collector are not great because you might have the situation where the collector then learns all the individual submissions, even though it can't link them to a specific client, that sort of linkage is perhaps too revealing. So you've been heavily involved in this effort to bring up PPM, Privacy Preserving Measurement, or PRIV, I guess is the name now. Yeah, we renamed it again. We're really good at names, it turns out. And so yeah, I was wondering if you could sort of speak to why something like PRIV or PPM is really needed beyond what we can already do with these oblivious proxy message-like protocols like OHDP. Yeah, so I think if you think about, I think there's a general pattern in attempting to improve privacy of taking systems which had, which are very, very powerful and convenient, but also had terrible privacy properties and trying to make them like have better privacy properties. And what you typically end up doing is saying, well, that one size fits all solution where I just like did everything with terrible privacy doesn't work and I need to find a new solution, a set of new solutions that do different things. And so I think OHAI covers one piece of the space and this PPM covers another piece of the space. And the two pieces of the space that PPM really covers well are two sets of tasks. One task is when you want to collect data that is like numeric data that you then want to be able to like slice and dice in various ways. So if you think about doing a poll, for instance, you can often do cross tabs and you'd say like, you know, how do people with different educational attainments perform? How do people with different incomes perform? And how do people meet with specific educational attainment income things right? And so what you do is you just collect like, you know, you say like, well, you know, what, who did you vote for? And by the way, like what's your household income and what sense of tract do you live in? And like what's your educational attainment? You know, quickly again to the point where the demographic information is revealing enough, you can tell what people who actually did which particular thing. And so you have this like, this kind of nonsensitive information tied to the sensitive information. And, and so like, that's, and so you just collect all that stuff, even with Ojai, then you go back and you say, well, take the nonsensitive information and use it to de-anonymize the sensitive information. And there's like a long history of this kind of work that some famous work by Latanya Sweeney and some, and some other stuff by Arvind Narayanan on various kinds of data sets showing how you could do this kind of thing. And so, so that's, that's, so that's one problem. The other problem is the problem of basically free-form non -numeric responses. So they first need to get URLs. So, so what you often want to do is say, what are the most popular URLs or the ones that are like causing the most crashes or something, right? But if you just say, please send me like every URL that was like last URL you visited in your crashes, sometime what you're going to get is you're going to get the URL that is like someone's like, you know, Google doc that has like, you know, link sharing on it. And that doc will not be something they want shared. And so the task you often want to do is to be say, let me only get that these, these sort of like free -form values, but let me look at the most popular ones. And so that Ohio is also not good for that for the same reason, because the problem isn't learning which IP address had this doc is the doc itself. And so, so fortunately there's been like enormous kind of like really, really nice crypto work on this over the past three to five years. And in particular, two pieces of crypto that have come out, one called Prio, which was designed by Henry Corrigan Gibbs and Dan Bonet, and one which doesn't have such a cool name, but they call like privacy reserving techniques for heavy hitters by also Corrigan Gibbs and Bonet. And so what these do is basically take advantage of kind of a multi-server architecture. And what you do is you take your data and you take, and you break it up into two shares and you send, and the shares individually can't be read. And then you send one share to one server and one share to another server and the servers can together find the answer to the question, like for instance, what are the most common values or what is the average income, but neither sort of individually sees anybody's data. And so this is like, I say, there's some very, very cool crypto tech. And there's already been getting some initial deployments. We did a test by Firefox. I know there's been Apple and Google and ISRG did a test deployment with some prior related stuff with their exposure notification stuff. And so now there's a real effort to bring this to standardization, a quite aggressive effort, actually. And the main people involved, I know you're too modest to say, but you and myself and some of the people from Cloudflare and some people from ISRG and some people from Google. So we're really hoping that that's also being sort of, that's like one notch behind Ohio in the ITF queue and that we're hoping to bring that to the ITF in the fall. So I think it's really important to understand that these are complements, they're not competitors. First of all, there's some tasks in which Ohio is really good, some tasks in which these crypto things are really good. The crypto things are by and large slower, but they're in some ways more powerful. And then also there are times when you do both, when you say like, well, we're going to do crypto, we're also going to do Ohio because we're trying to protect individual things that submission identity as well. So I think, we're like really, I think entering an era where we're going to be able to do some really, really cool stuff. We're just entering that era. So I think we're going to see a bunch of refinement of these techniques as we go, but it's really exciting. Yeah. And you mentioned that the sort of the underlying cryptography of this is relatively newish within the past couple of years, which kind of sets it apart from, I guess, prior work the IETF has embarked upon standardizing the CFRG in particular. So there's going to be that new challenge, we're sort of like working with and defining these new constructs that may be unfamiliar to people. But I think another important point that you mentioned as well is that this is a multi -party protocol. There are multiple servers involved, clients interacting with multiple servers, which is quite different from other protocols that the IETF has standardized where typically any more than two parties in a protocol, it causes people to get confused, self-included. So I'm wondering why do you think the IETF is ready for this sort of transition to really kind of like a multi-party computation protocol? Like what has changed technologically speaking recently that sort of enables this and puts, like why is now the time, I guess? Yeah, yeah. Well, I think three things. I think first there's the technology is not ready just like from a technology perspective. Like the gap between what you can do without this and what you can do this has gotten very large. And it used to be like this stuff kind of like really, really crafty and slow. And so it wasn't very attractive and now it's become very effective. That's the first thing. The second thing I think is that the experience of TLS 1.3 in particular, and then with MLS has really taught, the IETF has really learned how to collaborate with the academic cryptographic community to deploy cutting edge stuff. So when 1.3 was designed, I mean, 1.3 is based on very, very old cryptographic primitives and designs, but there was extensive academic collaboration review for verification of those things. So we worked very closely with the academic community. And then MLS, I think, is actually involves a bunch of new, not new crypto, but new really new protocol concepts that weren't there previously available. And again, those are developed in collaboration with the academic community. And so that has gotten to the point where we're trying to cut the time between when things come out in the crypto world. And we think they're ready to go in the time when they can actually be brought into standardization wide deployment. So that's the second thing. The third thing is really not about crypto security at all, but it's about changes to the way software is deployed and the way that it's just now become so much easier to roll out new pieces of software on the server, especially very, very quickly. And I think we're seeing this where you're saying, you were going, circling back to the very beginning with DCH, where, you know, when we first started with the DCH, I think the time between like the first drafts and when there was like a version up on Cloudflare was like weeks and a version in like Firefox, it was like weeks. And so the fact that now it's possible to deploy a large scale server farm that will do something like this and do it very quickly means it's actually plausible to believe that we'll have two large scale servers operating this like, you know, in the next short period of time, as opposed to, you know, years and years out. Right. Right. Yeah. And I think that pretty much sums up my feeling of the situation as well. And I think, you know, puts us in a good place to wrap things up here. So we talked about, you know, a number of different things ranging from the stack, up and down the stack from, you know, TLS to DNS to application layer privacy and measurement and whatnot. And all of these, you know, different protocols that we're working on are sort of complementary, as you say, you know, they're each sort of plugging a particular gap or they're filling a particular role, and you can deploy them in unison and depending on your use case and your environment and whatnot. And all, you know, for the end, all towards the end goal of, you know, making the web and Internet more secure and more private for people. So I guess in the little time that remains, a very quick question for you. If you are a newcomer to the ITF coming up for this meeting, what would you recommend that they check out? I think, definitely, I think Ohio is going to be interesting. I think, you know, a bunch of things we've been talking about, I think are hard to come to speed on. So I think Ohio is a good place to start. I think the PIV work will also be a good place to come. Because again, like they will be presenting to people for the first time and they will understand it. So it'd be very easy to come to speed. I think what else is going on? Yeah, go ahead. Just so we don't cut you off. Those are two good places to start and check it out if you're interested. Thanks, Chris.

Cloudflare Research

Don't miss these great sessions from the Cloudflare Research team!

Watch more episodes