Mike Rosulek: visiting researcher on cryptography and security

Presented by: João Tomé, Mike Rosulek

Originally aired on May 10, 2023 @ 11:30 AM - 12:00 PM EDT

In this segment, João Tomé gets to know Mike Rosulek, a visiting researcher in Cloudflare’s research team. We go over his experience as a computer scientist interested in cryptography and security, his work related to “The Joy of Cryptography” at Oregon State University and why the area MPC (secure multiparty computation) is relevant for the present and future of Internet security and protocols.

English

Research

Transcript (Beta)

Hello, everyone, and welcome to our Research Corner segment, and this is a name I just invented right away, so hope it sticks. And I'm João Tomé, storyteller at your service, and I'm here with Mike Rosulek. And you're a visiting researcher at Cloudflare, right? I got that right, right? Yes. And you're also an associate professor of the Oregon State University, so how much time do you spend at Cloudflare as a visiting researcher? I've been here since July, and actually I'm finishing up here later this week, so it's been about a six-month stay here, and I'm working, you know, mostly full-time, like three-quarter time. Six months, it's already a lot of time, at least for a tech company. I bet that in the academic area, six months is not a lot of time, right? No, it felt like a few weeks, to be quite honest. Yeah, it's been an adjustment. It's been a fun adjustment, but yeah, things move at a different pace here in industry, so it's been one of the things that I've had to learn. In six months, a lot happens, a lot of products are shipped, right? A lot of changes, right? Yeah, although I've been on the research team, and we're a little bit insulated away from product releases, but we do have the quarterly project proposals, and that's a new thing for me, thinking in terms of planning according to the quarters of the year. Yeah, usually we're planning for maybe an entire year at a time, so a lot of fast-paced planning, and projects spinning up, and projects winding down pretty fast. Let's go back to the beginning, actually. As a computer science, you're interested in cryptography, of course, and security, and you have a lot of work there, but can you give us a view of where it started for you, when you got first interested in this area, and where in the United States? I mean, it depends on how far back in history you want to go. Just where are you from? Give us a tour. Sure. I'm originally from Iowa, from a small town in Iowa with less than a thousand people, so not what you would consider the tech capital of the United States, but my dad was a pretty technical guy, and we were early adopters of computers, so growing up, even in the early to mid-80s, we always had computers in the house. I was always interested in programming and computers, and I knew that I wanted to study computer science in college, so that was no mystery, and then I realized I liked the theory behind computer science, because I always liked math as well, and so the theory of computer science was a good mixture of programming, which I liked, and computations, and math, so I went to grad school at the University of Illinois. Just one thing. Before then, when were you passionate about this area? Like, in a sense, this is really interesting, I can do stuff with this. Was there a time, was there a software in specific, or a moment, or a figure? Well, when I was about seven or eight years old, my dad started teaching me BASIC, the programming language BASIC, and I think it was GW BASIC, which came, it was a DOS. That's from the 80s, right? Yes, we're talking late 80s at this point, and GW BASIC, I think, was the flavor of BASIC, and I mean, I was obviously not a very good programmer at that age, but was able to write really simple games where the computer would ask you a question, and it would say, do you want to turn left or right? You would say left or right, and there's maybe like three questions that it would ask, and there was one correct sequence of answers that you could give, otherwise, you'd die a horrible death or something in this game. So, I always thought that was pretty cool. One day, I came home, and my dad had written this thing that printed out your name, but in different colors. I thought that was amazing, that you could print stuff to the screen in different colors. So, the 80s were a magical time of computer stuff. So, ever since then, we always would put together our computers. So, I enjoyed assembling our computers from the individual parts, which I think was pretty common nowadays, but it was pretty rare back in the day. So, in that sense, that was the hardware part, right? The love for the hardware. You prefer hardware or software, in a sense? I like doing stuff with my hands, but I don't do voltages or capacitors. I don't do soldering or anything like that. I like it when the hardware plugs together like Legos. So, I mostly prefer software. I mean, I'm comfortable plugging pieces together if they fit together like Legos, but I'm not a hardware hacker. And then, you had to go to study, to the university, and you went to Iowa State University, right? When did the passion for, more specifically, cryptography, which is all related to mathematics, come about, in a sense? Yeah, the cryptography, there wasn't like a moment where I decided that I must be a cryptographer. It was really by accident. The thing that was not by accident was that I wanted to do theory, computer science theory. So, I had a couple of classes from a professor. His name was Jack Lutz, who was just, the material just blew my mind, and he was a great professor. So, that was at Iowa State. He taught the theory courses, and I thought, wow, this is amazing stuff. And then, in grad school, I knew I wanted to do theory, and I tried. I took a bunch of classes, and it's like, this stuff is okay, but maybe grad school is not for me. Maybe I should just go become a software engineer. And then, it just happened that we had hired, at the University of Illinois, when I was at grad school, we hired a new professor, and he was doing cryptography. I honestly didn't know that cryptography was about proving theorems, and the theoretical stuff that I enjoyed, which is proofs, and definitions, and stuff like that. I didn't know that cryptography was like that. I thought cryptography was more like a cat and mouse game. You try to design something clever, the bad guy has a clever attack, and you try to have a more clever way around that attack. I didn't realize there was a provable security aspect to cryptography. So, once I learned that, I thought, well, this is a perfect combination of everything. Yeah, but I didn't really know that. I just learned it by accident that there was this new professor, and someone said, he's going to be looking for students, maybe you should talk to him. So, that's how it happened. I really love, sometimes it's by chance that you go and develop an area. You were talking about how cryptography works in a different way than you initially thought. Can you explain a bit more about that? Well, I had actually taken a course as an undergrad. I took a cryptography course, because everyone knows cryptography is cool. That's pretty obvious to everybody. It's intertwined with the Internet completely, right? Yeah, exactly. It has cool stories about Alice and Bob, and hacking, and it's sexy stories. But the way it was taught when I took it as an undergrad, it was from the math department. So, it wasn't grounded in the computer science style of computational complexity and provable security. It was like, here's a thing, and here's the math behind it, and here's an attack on that thing, and here's a different thing, and some kind of attack on that thing. It was interesting, but it didn't seem very systematic to me. I wasn't too impressed with, here's a thing, here's an attack on that thing, and here's a way to avoid that attack. I thought it was just like I said, a cat and mouse game. My advisor, Manoj Prabhakaran, who joined at Illinois, his research was in provable security. So, formally defining what it means to be secure within some model, and then constructing a thing, and then proving that it's secure, which is something we didn't touch on at all when I was exposed to cryptography as an undergrad. So, once I knew that you could do proofs of security, then I was hooked. Exactly. That is a very science-like type of thing, because it's putting things to the test, right? It's all about that. You work especially, your usual research area is MPC, which is all about secure multi-party computation. How do you define this? What is secure multi-party computation? Well, how long do we have? Do we have a few hours? We don't. How is the high-level, hey mom, here's what I'm working on type of thing? Yeah. So, one way to motivate MPC is, I think most people are familiar with encryption. And when you encrypt something, either you don't know. So, you see something that's encrypted, either you don't know the key, in which case you learn nothing about what's inside, or you know the key, in which case you learn everything that's inside. It's all or nothing. So, MPC is a middle ground, where we have some private data, and we want it to remain private, but we want to learn some partial information about it. And MPC is best when different people have private data, and they would love to be able to put that data together and run a computation on it, but there's no one place where they can send their data that they both trust. They would like to learn the output of some computation, but they don't trust each other, they don't trust some central server where they can both send their data, the server does the computation and throws the data away and only gives the result. So, MPC is a way to achieve that same effect, but using cryptography. So, it's an interactive protocol. That's all very abstract, and I'm sure the next thing you're going to ask is for an example. It is, but we can actually just, in a sense, how is it implemented? It's a protocol, it's related to encryption, and it's like a Zero Trust protocol, in the sense that you don't trust any of the players there. But in what way is it already implemented on the Internet, or is it not? Where is the state it's in, related to the Internet, as it is? It's still relatively new. One place where it's... So, it's not implemented in the Internet, we know. It's implemented and it's out there. It's not really part of standards, so it's not in the fundamental infrastructure, like TLS, for example. There are real-world applications, and I can name a few. One of them is... So, there's a few password managers that have this feature, where they will check whether your password has shown up in some sort of breach. So, there's a website that many of the viewers will know about, Have I Been Pwned, where it will check whether your email address has been compromised, your password has been compromised. So, there are password managers that check this for you automatically, and they check whether your password has been used before. So, how do you tell whether your password has been used before? Well, you can just report your password to the server and say, hey, here's my password. Can you check your database? But obviously, we don't want to just share our password with everyone. So, they're using a special kind of NPC, where as the client, I have a password. My password manager has a password that it knows, one of my passwords. And this big server has a huge database of billions of compromised passwords, and we just want to know whether there's anything in common. That's a very specific use case there, for sure. Yeah. So, that's a very specific computation that we would want to do. There's two pieces of data. There's my password, and there's this big database of passwords. And I don't want to share my password. They don't want to share their huge database of compromised passwords. But we would like to learn the outcome of some simple computation, like just a membership test. So, that's one example. That's one of my favorites. It is a very good one, for sure. You got a sense that it's a comparison without compromising the data, in a sense, which is really interesting. There's more use cases that you remember that not only are in use, but they maybe could be in use in the future. Yes, there are. I was going to say that in the cryptocurrency space, they seem to have found several applications for these kinds of techniques. One simple one is just threshold signing, which is also similar to a project that I've been working on here at Cloudflare. So, maybe that would be a good example. So, in a digital signature, you authorize this piece of information, and you have a public key. Anybody can check that you signed this, and you're the only one who can sign this. This is the main technology behind TLS certificates. Every TLS connection is signed with a digital signature. Cryptocurrencies and blockchains work the same way. When you send money or make any transaction, you sign that transaction. That's what authorizes you. You're the owner of this pile of money. The pile of money is associated with a key, and only you can sign away money from that pile using a digital signature. It's registered that it was you also. That's also important, right? Yeah. It's depending on what kind of cryptocurrency. Some of them are anonymous, some of them are pseudonymous. But in any case, the money is associated with a public key, and the only way to get money out of this pile of money is to sign something with that public key. So, the secret key is a very valuable asset because maybe it's protecting a huge pile of money. So, one idea is to not have that key sitting anywhere. The key is split into many pieces. Let's say it's split into two pieces just for simplicity, but it could be many more. Then it's possible for two servers that each hold a piece of the key, neither of them by themselves can sign this data. But if they talk to each other, then without ever reconstructing the key in one place, it all happens inside of the cryptographic domain. But they do this clever protocol, and at the end, what they learn is some signature using that key. No one has the whole key. It's separate. The key is from no one. It's from those who have it, but they only have a part. They have to reach out to the others. Yeah. Collectively, they have the information that comprises the key, but individually, they're holding something that's meaningless by itself. It's like, I have a random number, and you have a random number. Individually, those numbers are meaningless, but if we happen to add those two random numbers together, that's the secret key. So, individually, neither of us know them. And how can we transport that to the work you did specifically on that at Kaufler? It's related to Kaufler Metals, right? Yeah. So, we were looking at this kind of technology in the context of TLS. So, as I said, every time a Kaufler Metal terminates a TLS connection, let's say on behalf of some customer, we're signing stuff on behalf of that customer. We hold the customer's private key. We hold their TLS sign-in key, and we sign to say that we are authorized by xyz.com. We are the correct xyz.com here. We'll sign something under the certificate that proves that it's us. So, we were looking at whether this kind of technology could be fast enough that you could use it in TLS termination. So, the idea is maybe two different metals, either in the same colo or maybe in different colos or different geometric regions. Metals, just to explain, is like the computer in a sense, and colos are data centers in a sense, right? Yes, that's my understanding. I have a very superficial understanding, just living in the research land, but yes, that's my understanding. Yeah. So, two completely separate machines would each hold shares of the key. And when a connection comes in, then both of them spring to life and say, let's sign this connection. But in a way that if one of those machines happened to be compromised, then the attacker wouldn't be able to walk away with our customer's TLS keys. They would only get some meaningless random stuff. They would have to compromise two machines in order to walk away with the customer's signing keys. So, we worked on that together with some awesome interns, making some prototypes and just seeing whether this was fast enough to be in the critical path of TLS handshake termination. And it's promising. So, it's still a work in progress, but it's very promising results so far. So, it's good to avoid hackers because it's difficult to compromise two. One server may be compromised, don't know why, but in some way it could be. But two is really, really difficult. Really, if they're more separate in some way, that is like an effort out of this world. Let's say it like that. Yeah. I mean, all of the different... These are some discussions that we've had. Is it really harder to compromise two machines? Because the entire premise of threshold signing is that it only makes sense if it's harder to compromise two machines than it is to compromise one. So, we had a lot of discussions to ask whether this was true. I think in the case where the machines are in different geographic locations, then some of the machines might be compromised because of physical access. If they're together, it's more easy to be compromised, right? Yeah. If they're together in the same data center, all running the same... I mean, there's a good reason why all the Cloudflare machines run identical or at least very similar software. If it's a software problem, then maybe compromising two is as easy as compromising one. I don't know. I'm just a theory guy. But if something is compromised due to physical access, then just put the two pieces of the key in different parts of the world or under different political structures. And it certainly makes physical compromise harder for two machines than for one. Sure. Sure. Work in progress in that one. Let's go to privacy preserving measurement. What is this and why is it important? So, privacy preserving measurement is maybe the most pure application of these MPC technologies. So, this is an idea that's been proposed for web browsers. So, people who make a web browser would be interested to know about when people are experiencing errors in their browsing or delays or some kinds of problems. So, it would be nice if the browser would report back to, let's say, Firefox. The browser would report back to Mozilla that this user had an error at this time. And then they could run some statistics and figure out whether it's a problem with the browser or some systematic thing going on that they can debug. But there's like an obvious issue where I don't want my browser continuously reporting back to Mozilla all my browsing history. Exactly. So, for privacy concerns, it's, yeah. Yeah, that's... And this works like a layer on that, right? Yeah. If a browser wants to be like a privacy-friendly browser, they would not have such a feature. So, this privacy preserving measurement is just applying these MPC tools to the browser We have a bunch of measurements from a bunch of millions of users of the browser. And we don't need to see all of those measurements individually. All we want to know is some aggregate statistical information. Like, oh, there's... Like, all of a sudden, everybody's having problems with this website. Or suddenly, people are having trouble with DNS. We just need aggregate statistical information. So, that's an example of MPC, right? So, we have some private data. We only want to learn the result of some computation on that data. But we don't want to learn every single detail of that data. Of course. So, this privacy preserving measurement is part of some activity at the IETF working on standards for this. But trying to standardize different protocols that could be used for the clients to report their measurements and for the server to analyze it. And so, that the server learns only the result of some aggregate information. In a sense, it gives that layer in terms of Internet. First, it's private. Private data is not... Your private data is not going into the browser. But in a sense, those who operate the Internet networks need to learn if a website is fast enough. If it's not, they have to adjust. Performance is all about also knowledge of what is happening. Not necessarily to a specific user, but to a specific website. The experience in general. So, it's not private data, but the networks need that information to make the Internet faster. In a sense, also more secure. So, this helps there, right? Yeah. And it's just one of those cases where if you really stop and think, you don't need all of the data individually. You don't need every single report. You don't need to know that Joao was visiting scandalouswebsite.com. But you need to know if 10,000 people are having the same issue suddenly. Or one website is having an issue for 10,000 different people. So, the thing that's private is all the data individually. And the thing that you want to learn is just the aggregate information about the data. The statistics part, yeah. Is there a performance improvement that this could also help in some way? Well, performance improvement over... It depends on what you're comparing it to. So, if you're comparing it to, let's just collect the data and we don't care about privacy, well, then it's never going to be faster than that. Because we're adding on some extra cryptography and some interaction to do this in a privacy-preserving way. So, I think maybe I misunderstood the question, but I think adding this layer of privacy doesn't ever make things faster, really. But it could be efficient enough that it's invisible to the user. And that's what the goal is. Exactly. And to have this layer of privacy in a performant way that doesn't trouble the experience in some way, that's important too, right? Yes. Let's move on to Privacy Pass, another of the things you worked here at Koffler. This is an existing tool, right? Koffler 2, that lets end users obtain unlinkable tokens that they can redeem at websites. And I'm reading as it's clear. What is this all about? Since I explained a bit, but... You explained it so well. How could I improve upon your explanation? I read it really well. But how can we explain the work you did on that matter? Yeah. So, Privacy Pass is used... One of the most visible ways that it's used... Recently, Cloudflare and Apple have this collaboration where if you're using an Apple hardware, the Apple hardware can prove that it's a unique Apple device. It can prove to the Apple servers that it's a unique physical piece of Apple hardware. And so, if you're a website, you would want to know that... Cloudflare websites want to know when they're talking to a human or a bot. One way to know that you're talking to a human is that this human has an actual physical piece of hardware. It's not just 10 million humans on the same virtualized device. So, Apple has this feature where you prove to Apple that you're a human. And so, Apple gives you a token that says, yes, this person convinced me that they're a human. They're running Apple hardware. And you take that token and you bring it to Cloudflare when you connect to a site. And Cloudflare says, oh, yeah, well, this is reasonable. I don't need to show this person CAPTCHA because they're a human. So, that helps eliminate the CAPTCHAs that we all hate. I definitely hate. Yeah. And so, the privacy challenge there is you don't want Apple to learn what website you're asking for. And you don't want the website to know necessarily when it's the same person over and over. So, each of these tokens is kind of anonymous. It can't be linked to other tokens. So, the website can't say, oh, this visit and this visit and this visit are all the same person. So, that's what Privacy Pass does using its cryptography magic. And so, one of the things that we were working on during my time here is whether you can do some rate limiting. So, you don't want one piece of hardware, one Apple device, let's say, generating millions and millions of these tokens. Because that could mean that this person is generating tokens and selling them on the black market to these bot farms. And this is called farming. It's called farming the tokens. So, you don't want one device to farm a bunch of tokens and then distribute them out. Because the point of these tokens is they can't really be linked to the same source. So, would there be a way to be able to detect even if the system can't tell which website a user wants a token for, they'll be able to tell when the same person wants a token for the same website repeatedly. So, it's kind of a subtle thing. But we want to release the minimum amount of information such that we can still rate limit. So, maybe the system will see that João has requested five tokens for the same site. I don't know what site that is. But five tokens in the last 30 seconds is too much. So, I'll just refuse this one. But then he requests another token. And I can see that it's for a different site. But I don't know which site either of them are. But since this was his first request for this site, I can grant that one, but not the other one. So, that's the idea. So, we've been working on ways to achieve that. We've been working on figuring out whether these are secure, that kind of stuff. In terms of the overall experience, you already discussed at the beginning that it's definitely a more fast-paced experience working at a company, even on the research area. What is the overall experience you saw? And do you think Cloudflare is distinguishable in some way in terms of culture, in terms of how it operates? Well, this is my only experience in industry. So, it'd be hard for me to compare to other places. But I can compare to my experience in academia. One of the biggest differences is that in academia, we're really focused on writing grants and getting papers published. Those are like our two main jobs. In industry, you don't have to apply for grants from the NSF, for example. So, that's really nice. And the focus is, at least at Cloudflare, there's a strong focus on these Internet standards. So, a lot of involvement in... There's a cryptography research group at the IETF. IETF is a main standardization body for Internet technologies. So, yeah, in an academic research project, the outcome is usually a paper that you publish. And in a Cloudflare research project, a lot of times, the outcome is a product. And a lot of times, the outcome is a standard that gets published and adopted. So, it's an interesting difference. And that's one of the things that being at Cloudflare is just the idea of working on the Internet as an infrastructure. I think working on standards, working in an open way, it's clear that the attitude is making the Internet better. That was definitely the message that I got before I started. And I was wondering, is this really sincere? Is this just marketing speak? But my experience was that it's very sincere. And all the discussions in the research group are about, are we making the Internet better? How can we make things better? It's not about extracting machine learning models from customers so that you can serve them better advertisements. True. No advertising specifically. Yeah. So, that was refreshing to me to see. And just an interesting difference in what the target outcome of a research project could be. And it really, even for me, working at Cloudflare in a way that is exploring and learning things mostly, it's really amazing to see how, first, a lot of work is still being done. And it needs to be done because there's new actors, new tools to do arm in a sense. So, and putting the standards to the test, making them more resilient, better, more performant. Those things matter. They have a real world impact. So, always interesting to see that, right? Yeah. This was great. Thank you so much. I think we got a good scope, high level scope for sure, of what you did here. A lot of stuff in different areas. So, it was great. If we had more time, we could break down a few more things, but I think it was a good high level view of your time here. And where are you going next? Back to the university, giving classes? So, I'm on sabbatical now, which is why I'm visiting Cloudflare. I'm continuing on my sabbatical until September. So, I have another nine months until I have to teach again, which is surreal for me to think about. So, I don't have any official business lined up for the next few months, but I have a lot of writing projects that have been put off to the side for many years. And so, I'm going to just give myself a personal writing retreat and do lots of writing for the many months. So, papers are in order, are coming? Yeah, papers and books and proposals and all sorts of things. Sure. Thank you so much. And that's a wrap. All right. Thanks, Joao.

Cloudflare Research

Don't miss these great sessions from the Cloudflare Research team!

Watch more episodes