How to Prove that a Computer System is Secure

Originally aired on June 15, 2020 @ 5:30 AM - 6:00 AM EDT

Karthik Bhargavan - Researcher, Inria
Tanja Lange - Professor of Cryptology, Eindhoven University of Technology
Moderator: John Graham-Cumming - CTO, Cloudflare

English

Internet Summit

Security

Transcript (Beta)

🎵Outro Music🎵 Alright, a slight change of gear. I've got a couple of mathematician cryptographic nerds with me, which makes me very happy because that's my background. Furthest away from me is Karthik Bhargavan, who is Director de Recherche at INRIA in Paris, which we'll translate as researcher, shall we, into English. And then we have Tanya Langer, who is Professor of Cryptotology at Eindhoven University of Technology. And our topic is how to prove that a computer system is secure. But before we do that, there was a bit of a controversial comment, I think, this morning in the first talk. So Sir David Ellermann, who ran GCHQ, he was wanting crypto backdoors. So I'm guessing you two have some opinions on crypto backdoors. So should we start there? Not a controversial beginning at all. Well, maybe I take this one first. I'm working on low-level crypto, really designing systems. I work on figuring out how these systems are secure, trying to break them, figuring out how we choose parameters so I can't break them, that the NSA can't break them, that even the hyper, hyper, hyper NSA can't break them. And that's pretty damn hard. And we barely know how to do those things. And we have no idea, we don't have any great ideas of how to make a backdoor into this. So all the backdoors we know are something which weakens the system. And so this is actually working against what we are paid to do. I mean, we're trying to make the system secure. I'm not saying we always succeed, but at least we're trying to make it work. And so any backdoor that we know of is introducing vulnerability. And from what I got from the details, he was saying, oh yeah, you just give one key to the government and one key you keep yourself, and then it's all going to magically work. This is like you're giving another key to your door to somebody else and then trusting that somebody else not to enter unless you're a bad person. Or unless the new government thinks you're a bad person. Unless what you're doing has been redefined as being bad. And at some point it gets really, really uncomfortable. Apart from us not knowing how to do this technology-wise. I mean, of course you could say every good user has to give an extra key to the government. But then you're only getting the good guys. If you're willing to overthrow the government, if you're willing to blow up people, you don't care whether you're violating a law to give your key to somebody. I mean, this is like a minor misdemeanor compared to a gigantic crime you're going to commit. So I don't see this working at all. Plus, I worry that all solutions we come up with are weakening security. Yes, I mean, I can see that there is... I've spoken to a lot of people who feel that there is a need for some visibility into encrypted devices. This is encryption in general, right? End -to-end encryption, TLS 1.3, all of these things. And they're very worried about it. But indeed, the problem is that it is actually not very easy to even design a system with a backdoor that doesn't hand it over to the bad guys. A classic example in our field is that a couple of years ago, some of you might have heard about this attack. A couple of attacks in TLS, one of which was called Freak, and another one was called Logjam. And these were things that were exploiting these cipher suites in TLS called export cipher suites that were introduced in the 1990s, which were very obsolete. They're not supposed to be used anymore, but they were introduced primarily as, let's say, a backdoor into TLS connections from foreign countries into the US. And at that time, this was considered important for security. Later on, it became obsolete. Nobody was supposed to use it, but that code lies around forever. And an attacker could trigger it, open it up, and therefore downgrade the connection down to something that is exploitable with a laptop these days. So that's my worry. It's like, maybe you can do it right now and maybe it'll solve some problem, but most likely it will stick around in unusual places that you'll never expect and break security for everybody. But if I'm a policymaker and I've got in front of you, you know, me very, very smart people, surely I'm just going to turn to you and say, come on, guys, you're really smart. You can figure this out, right? Come on. You've got to figure out all this other smart stuff. Why is this a problem, this concept of a backdoor? Because it goes against what you're asking to have. If you're asking for a secure system, which is absolutely secure or computationally secure, except for some other people can break it. And we don't have any such systems. I mean, it's either secure or it's insecure. It's not like a little bit pregnant. It's either this or that. Okay. That explanation works for me as the policymaker, the little bit pregnant one. So let's just switch gears. So Tanja, you've got this particular interest in two kind of freaky kinds of cryptography, which some people will know about, some won't. So there's curve-based and post -quantum. Can you just explain what those are? Because I suspect there are a few people here in the audience know what those things are. All right. Let me first say where these belong in your security stack. They belong in the part that you hopefully will never see. Because the only reason that you would ever see them is if we did something wrong. Like you'll notice them if they break, if they're on the front page of the New York Times saying, oops, the world is insecure because Tanja did something wrong. Well, okay. You might also see them if you click on the lock icon in your browser and you actually go through all these two, three, four-letter combinations. And if it says somewhere EC, that stands for elliptic curves. And that is one of the areas I've been working on for many years. I'm very happy to see elliptic curves coming into more use these days. It is a very nice system in that it's very efficient. It's very small. It fits basically anywhere. And it means that we can use secure systems in areas where we didn't have it before. So you can now find in like very small sensor nodes and so on. The system is powerful or low demand enough that you can use it there. Now, the other area is a little sad because all this beautiful elliptic curve stuff will break the moment we have a quantum computer. So all the math we're doing there, where we're currently benefiting from having this really nice, efficient, small crypto, will not stand up to a quantum computer. So what can we do instead? So all the places we're using crypto these days, there is one part which is using a system, either elliptic curves or finite fields or RSA, whatever these are, to do something to start a connection. And all of these things will be broken with a quantum computer. So my other research area is finding alternatives and then making sure that these alternatives are fit for all the applications, that they fit on the Internet. I mean, Google has been doing some experiments with some of them, saying, OK, it can work for some. I also have been doing some experiments, including Cloudflare, saying, OK, what happens if I send not one of the cute little elliptic curve keys, but the larger keys that we're unfortunately facing when we have a quantum computer to fight with? Then you're dealing with a thousand kilobytes, like a megabyte is going across the wire. And then your experiment was doing like, what happens? Will they still answer? And unfortunately for the really big keys at this moment, no. So we're working on making it smaller and figuring out how secure it stays. So despite the fact that we don't have a quantum computer, we are able to anticipate its capabilities, and you can therefore come up with algorithms which will be not breakable on the quantum computer. So we're doing all that work in anticipation of someone having this machine. Yep. So we have a pretty good idea of what a quantum computer can do. So we know all the damage it does. And afterwards, it's the same as we have as current crypto. It's only good up to our current knowledge. It might also be that there's some smart person who has another idea that nobody had so far, and suddenly it all breaks down. But we want to get to this level of certainty, of this level of comfort with our newer systems. Now, some of those are not so new. There are some which are like 40 years old from 1978, but we haven't used them nearly as much. So they're not as approved and as well implemented as the other systems are. And they're also not as nice and speedy and small and fitting everywhere. One thing is a word of worry is it's not just for the one future day when there's a quantum computer. If you're doing anything that you still care about 20, 30 years from now, then you have to be aware that there are some adversaries. Like the US is known to record all your data. They're tapping the Internet cables wherever they can and then put everything in Bluffdale. And over there, they have a lot of stored ciphertexts that they can't break now. But all of those involve some of the crypto which will be broken once a quantum computer exists. So if any of your stuff is still interesting, be it trade secrets that you actually keep for yourself and don't publish, be it other sensitive information like personal data, privacy information that you have an obligation to keep secret, or whether you're in some government which by then or now is evil and might disagree with what you're saying. All of those things, if they're still interested 20, 30 years from now, you have a problem. You basically have a problem now, except for you don't know it yet. Great. So you're working on these kind of algorithms and what we do in the future. Katha, you're working on something different, which is we've got an algorithm and then we actually make a computer implementation of it and we mess it up or it isn't quite as good as we thought it was. So what's the approach? You're trying to prove things are as good as we say they are? Right. So, I mean, when cryptographers like Tanya design these new schemes and new constructions, they usually come with proofs, mathematical proofs that they're achieving something that they set out to achieve. And then somebody is going to implement them in Google Chrome, right? And the gap between what is actually proved about the crypto construction to the protocol that was standardized in the IETF, to the implementation that was finally implemented in thousands of lines of C++, that's a huge gap. And so when you see that there are attacks on software like OpenSSL or whatever, and you say, well, this is all supposed to use the latest, greatest crypto. Why is there an attack? The attack is because of these huge gaps. So a lot of the work that I and my team do, and there's a whole field of people who work in this intersection of software verification, applied cryptography, protocol analysis, is to really try to narrow this gap. And over the years, there have been many formal models and proof techniques and even automated tools that have been developed where you can feed it your new cool design for doing, I don't know, key escrow in the cloud or whatever it is that you want to do. You can feed it and you can actually verify that it does at least what you intended it to do, whether that's good or not. You don't want the key escrow in the cloud. You don't want to do that. But whatever your coolest new idea is, you can analyze it. And a lot of these techniques were used by various groups around the world, research groups, to contribute to the design of TLS 1.3, the version of TLS that was just standardized. So I think all of this area, including the tools that we use, have now come of age, and we can actually apply them to real world stuff. So it's kind of an exciting time. Just dig into that because I think you talked about two different things, one which is software verification, one which is protocol verification, and then people might not understand the difference. So tell me about what is a protocol in this sense? And what is this? I mean, software is probably more obvious. Right. So a protocol would be something like TLS, for example, which is the protocol that secures HTTPS connections between clients and servers. There's also a whole bunch of other protocols you're using. If you're using WhatsApp, most likely you're using end-to-end encryption underneath WhatsApp, which uses a protocol called Signal. And if you're using the Wi-Fi, you're using some combination of protocols which have been standardized by the IETF. So protocols are the way you use cryptography to compose it in a way that multiple parties can agree upon certain things, including keys, can secure connections between two or more parties, and so on. So already, when you go to the level of a protocol where it starts getting really complicated, there is a gap between what the underlying crypto may have guaranteed you and what the protocol actually wants to achieve, which is something usually higher level. I want end-to-end security of some kind. And to do this, I'm going to use elliptic curves, I'm going to use post -quantum chem as a construction inside it. But in the end, I want a high-level user -understandable view that I have a secure connection to my bank, right? So the protocol is basically taking you up to that level where you can give this end-to -end, easy-to-understand, but often not very easy-to-understand guarantee. And we've managed to mess this up quite a lot of times, right? We've designed protocols that have had lots of flaws in them. Right. And we have. And we have, yes. Again and again. And sometimes, even protocols that were designed by expert cryptographers, they, 20 years later, someone finds a flaw in it. And once you find a flaw in this kind of protocol, you're like, of course, how didn't people see it before? And the answer is, well, this was a three-line protocol. The flaw was found 17 years later. This is the Needham-Schroder low attack, for example, lowest attack on Needham-Schroder protocol. And that's because there was one particular kind of threat model that was not considered when people analyzed the original protocol. So when you're thinking about whether a protocol achieves its goals, there are at least three elements which you have to kind of formalize. The first thing is, well, what is the security goal? And often these things are implicit in the head of the designer, and it's never written down. Then what is the threat model? And this may be written down. But in fact, what you think of as a threat model might evolve. There might be new threats that you didn't anticipate when you designed the protocol. And then you have to do the proof, right? So then you're going to show that your protocol, which is a mechanism, matches the security goals under this threat model, given certain assumptions on the underlying crypto. So that's usually the structure of how we do these proofs. And most of the time, even when you just write down the assumptions on the threat model, you start finding attacks. You don't even have to do the proof. Because just making that explicit, writing it down, is already going to expose all kinds of funny things that you didn't think about when you first thought about the protocol. And that's at the protocol level. That's before you've written any code. This is how it's meant to work. You're analyzing how it's meant to work. And then you write some code. And there's another level of, it all goes wrong. Right. And I would say that's probably where most stuff goes wrong is in the code, right? Because we all know software bugs occur everywhere. Usually, if you have a bug in something that's rendering an animoji, it shows you the wrong thing. Who cares, right? Or, OK, if it's something worse, your server crashes. OK, you restart it. Or your connection fails. You reload your page. That's kind of OK. But in security applications, if something bad happens, even with seemingly very low probability, 2 to the minus 128 probability, the attacker might be able to drive your system into exactly that case and exploit that to completely break your connection. So small, low probability errors, very hard to test things to become attacks. And so that's why the software bug problem for cryptographic software, for protocols, becomes much harder to protect against. I want to ask you a different question about actual crypto. So why do we need these curve-based things? What's wrong with what came before? So RSA, the thing that I learned at university, seemed pretty good. What's wrong with it now? So nothing is really wrong with RSA. It's just much, much bigger and much, much slower. So elliptic curves are really a nice area. Also, you mentioned three-line protocols. You can still find errors in it. Of course, RSA feels simpler. But the attacks on RSA that we know of are much more complicated. So it's really hard to estimate how fast an attacker can do those. And the best attacks we know are much more powerful than anything we know for elliptic curves. So elliptic curves, you need to invest a little bit more time to think about how they work. But then the attacks are super simple. So it's very easy to estimate how long it takes an attacker to break a system that you are executing. Assuming the attacker doesn't have a quantum computer. If the attacker has a quantum computer, it's also very easy to estimate. You're just not happy about the result. Right. So in the end, when we see elliptic curve cryptography now, we have actually developed the implementations together with knowing the attacks. And the attacks are not just like the math attacks. Somebody gives you a stale key and says, now go and break it. But we typically assume that our attacker is a little bit more active. They'll be sitting on the computer, like in your browser, you have your normal thing running, and then your web form also has some advertisement. Maybe this is a spy process. So we always generally assume that there is some attack that is much closer to you that can spy on how long does it take to do the operations. And these things might depend on your secret in there. So these are so -called side channel attacks. And with elliptic curves, they are a bit more recent that they come into the system. So we are much more aware of it, and we have better protections against these side channel attacks. And also, I mean, you've been looking at the verification of them. So we are more advanced in those than we are with RSA. Yeah, this brings up an interesting issue, which is that once you get into the actual implementation, you might be accidentally leaking information because of the temperature of the CPU when you're doing it, or the power line fluctuating, or something that can be read by a hostile processor. How do you go about doing that sort of verification to prevent that sort of attack? So it's hard, primarily because a lot of the attacks are very low level, and you don't want to do all your proofs at least at that level. So a lot of the things that you're trying to prove is that your code is memory safe. It doesn't have buffer overflows or underflows. It's that simple. You want to prove that it is functionally correct. It's actually calculating the mathematical function that you wanted to calculate. And then you also want to show that it does not have these kinds of side-channel attacks. Now, the side-channel attacks only start appearing in hardware. You can see some of them at the assembly level. But in fact, with the recent Spectre and Meltdown-type problems, those are even lower than the assembly. Even if you write perfectly good assembly, you might still be vulnerable in certain cases. So there, it's kind of an arms race. So you can actually put some protections in your source code. Then you can do some analysis and assembly. Then you can use some other hardware protections to kind of protect against these kinds of things. And there's entire industries that are building hardware devices that are better shielded against certain kinds of side channels. But in the end, you're always going to have this gap. You're going to have to trust the manufacturers, the compiler writers, and all of these people. And you have to make that decision for yourself. Whether I want to verify all the code down to the lowest, lowest thing, or do I stop somewhere higher and I trust that GCC, which is the compiler I'm going to use, and Intel, which is the hardware I'm going to use, are not going to mess up my proofs by adding new side channels, which I didn't realize would be there. We just saw yesterday Intel FPU bug that potentially can leak AES keys directly from the processor if you can get it going right. So you are right. There's a cutoff point where you have to say, well, I trust below this point. Because in the end, the hardware has the secrets, and it is going to do some calculations based on these secrets. So it is fully capable of leaking those secrets. So there is some trust that you need for that. There's a different sort of trust in the mathematics. So when I was learning about RSA, one of the interesting things about RSA is it relies on the belief that a particular mathematical operation is hard to do. Do curves have the same thing? There's a belief there as well, right? Well, there's always an assumption, but it's not a belief. It's not like we every day get up and give a donation to the god of factorization and discrete logs. I do that. You don't? So it's more like we pay a bunch of PhD students and other smart researchers to do real attacks and see how far they can get. And well, whenever I have some free time, I think about it. Of course, when all the Snowden stories came out, I was like, hey, where is the secret in those curves? Because the NSA has been putting out some parameters. Is there something wrong with those? Before, it was something where, well, every few years, I would think about it. And then actually, well, I spent some time, which didn't get any results out of it. But we do look into all levels of attacks. Is there something wrong with particular choices? There's something wrong with all choices. Are there some which are particularly hard to implement? And I think that would be my guess for what the NSA curves are doing. These are hard to implement correctly. Those are the ones where, well, it usually goes fine. But with one out of 2 to the 256, you get something which you can't handle. It will not happen with testing. But it can happen if somebody feeds you a bad parameter. And so when they were proposed in 2005, we didn't see much of formal verification coming around. So there's a lot of progress in getting better implementation. And also understanding better parameters. But at the end, it is just humans who have some ideas. And maybe we missed the big ideas. And somebody has, in one of the three-letter agencies, some write-up that they can break elliptic curves. And we just don't know about it. And what's interesting is both of you have this sort of a limit to which you decide to say, up to this point, we trust it. From your perspective, it's the discrete logarithm problem. We think based on evidence. But it's not a proof. You don't have any proof that that is not actually a reversible thing. I can't actually do it. Which is a fascinating thing. Because our lives rely on people like yourself saying, I'm pretty certain that that can't be done. And very educated is pretty certain. And you have the same thing, which is, well, I'm pretty certain the CPU is at this level. So we're not so certain about the CPUs. We're not so certain. Not so certain about the CPUs. So you're more certain about the mathematics. We just have to make pragmatic assumptions there. But indeed, the nice thing about having, I think, mechanized proofs and things that you can actually check using a computer, which is something that a lot of people are doing now, is that you have to make explicit, of course, all the assumptions that you have. I mean, even if I said, I have to trust the compiler, I have to make that assumption explicit. So first of all, I'm making all the assumptions both for RSA and for the machine, everything explicit in my proof. And the second thing is, I can then, once I've done my proof, I can turn some knobs off and see if it still works. I can say, OK, suppose this assumption failed, but I had something else. And that's what a lot of crypto papers do. They're like, OK, can I still prove this protocol if I did not have this Diffie-Hellman assumption, but I had that one, and so on. So if you can swap around certain assumptions, does it still work? How do you continue to make it work, and so on? And that's the kind of analysis that it enables, I guess. Are there examples of actual, real, proved-correct programs out there? So Firefox is running a verified cryptographic library that we built as part of their huge cryptolibrary. There's some small things which are verified. Google Chrome is also running some verified cryptolibrary built by people at MIT. Intel's chips are probably verified, right? I mean, to a large extent. So probably. And so on. So I mean, there's really large projects which have successfully deployed verified code at this point. But a lot still remains to be done. So you're both in this field. What do you worry about, personally, from a cryptographic perspective? You go home and think, oh, I shouldn't do that, because I know that actually there's that dark corner that we don't talk about with cryptography. Or is there something you worry about from your own use of the Internet, or things like that, because of your knowledge? I mean, there's the obvious thing, post-quantum crypto. So I worry about quantum computers. I actually care about my privacy. So I would like to use those things. But you can't just deploy a system on your computer. You also need the rest of the world to use the system. Else it would be a very boring conversation. And then I couldn't actually keep it secure. So that's the part I keep secure, or that I keep worrying about to keep secure. Yeah. And for me, I think a lot of our focus and a lot of the work I've seen is trying to protect things between endpoints. But once it gets to the endpoint, it's out, right? So in particular, I don't see any good solutions right now for secure erasure of stuff. I mean, I cannot really say that this data should eventually get erased. And at least the mechanisms that we are deploying right now don't do that. So I have very little confidence in where my data is spread across the Internet, and how long it's going to stay there. It's probably forever. All right. I think we're at the point where we should ask the audience if they want to ask us any questions. Hello. To follow up from John's question, how worried is the panel about random number generators or pseudo random number generators? Well, I was on the team that did some experiments with Dual EC and showed that you can actually use the backdoor. So it is something where I'm very worried about. But it's something where I have a better feeling of we can get this under control. It's more political issue than a science issue. So we are understanding pretty well how stream ciphers work. And you can turn the stream cipher into pseudo random generator. Then it's just the political will to rip out the crap we currently have deployed and replace it with something better. A different way to look at it as well is that we are currently using random numbers in far too many places, even in some places where it may not be needed. But where if the random number generator breaks, things are catastrophic. So there's lots of new designs for symmetric encryption and asymmetric encryption and signatures, which are trying to reduce this dependency on having a secure random number generator, the fragility of the system depending on this kind of thing. We have time for another one. Hi. You said that elliptic curve cryptography was more vulnerable to quantum computing. Was that purely down to the key length? Or is there something about it that makes it more vulnerable? I didn't say that, actually. So it is often held belief that elliptic curves are more vulnerable than RSA. And it's something which for very large security levels is true. But it is really a question where the crossover point is. And yes, the only reason is the size of the quantum computer you need to build. At this moment, elliptic curves are more secure. So it takes us longer to increase the sizes. On a quantum computer, both RSA and ECC are equally insecure complexity -wise. But elliptic curves are still getting a factor of 10 extra, whereas RSA only gets a factor of one extra. So then you get a crossover point. But it's only asymptotics. So we are working on that to figure out exactly where the crossover is. But basically, neither of them is good. I don't understand why the NSA is saying, oh, yeah, you can switch to RSA 3072. It's not a good option. I might know the answer to that. I think you just said it. It's not a good option. Is there another question? What sort of mathematical objects do you, in your opinion, hold the most promise for post-quantum encryption? So that's a hard one to answer, because most promising could be interpreted as it's definitely good. And then my answer is code-based, so something which is using error-correcting codes. It's a thing from 40 years ago. And we're very convinced that it's good. So it has lots of eyes on it. More promising would be some of the more recent designs. But that also means you go into an area where it might totally break. So lattice-based crypto is a lot nicer. It's a lot speedier. It's a lot smaller. Will fit in one of your packets, but has had less study. So that's more risky to use. There's an even more recent thing called isogeny-based cryptography, which, well, I just, with some colleagues, published a paper, posted a paper, which would be like almost the same size as elliptic curve keys, which would be awesome. But hey, this is something we came up in January. Do not use this. Use it in five years if it's still not broken. So very promising, of course. Hey, I like mine. But also very dangerous. All right, Tanja Karthik, thank you so much for sharing your knowledge. Thank you.