Master of (Computer) Science
Presented by: Nick Sullivan
Originally aired on February 7, 2021 @ 10:00 PM - 11:00 PM EST
Nick Sullivan, Head of Research, interviews heavyweights in computer science research in areas such as Cryptography, Artificial Intelligence, Databases, and more.
This week's guest: Kenneth G. "Kenny" Paterson, who leads the Applied Cryptography Group at the Institute of Information Security at ETH Zurich.
English
Interviews
Transcript (Beta)
Hello everybody on the Internet, everybody on the Internet who's watching Cloudflare TV.
This is a segment called Master of Computer Science and I'm Nick Sullivan. I run the research team here at Cloudflare and it's my pleasure to introduce Dr.
Kenny Patterson who's a professor of cryptography and computer science.
Welcome to the show Kenny.
Thanks, thanks for having me. Hi everybody. So Kenny is a very well-known professor who's worked on all sorts of different cryptographic attacks as well as constructions and he's made a lot of impact on the world in terms of how we can make it safer from a perspective using cryptography and using other technologies.
So Kenny, how did you get into computer science research? How did you get into cryptography?
I'm a late bloomer when it comes to computer science. So when I was a kid I really liked watching science programs on TV.
I was like a real geek and I had a home computer when I was from about the age of about 12 or 13.
Something called a BBC micro which was like a UK specific thing named after the BBC because they were trying to get everybody into home computing at that time.
So I would sit there and write programs on this computer and I even wrote a program that enabled you to copy the kind of magnetic tapes that we recorded our programs on.
So I wrote tape copying software. I was like a software pirate aged 14.
You can imagine me in the school playground, the schoolyard dealing in these tapes and being the go-to guy to get the latest programs from.
That was fun because that involved like quite low level programming to get underneath the operating system.
Well, to use the OS functionality to get to the RS-232 interface so you could talk directly to the bit stream going onto the tape and exactly record some of these bespoke tape formats that companies were using to stop their tapes being copied.
So I kind of went to the lowest possible layer in order to break through their protection systems.
So I kind of had a little bit of a head start on computer science, on computing.
But I actually went to university to study physics because I was absolutely fascinated by this kind of like fundamental nature of the universe, quarks and black holes and all that kind of stuff.
And a lot of that was inspired by watching BBC2 science programs. There was this thing called Horizon which was an amazing science program that was on, I think it was on Thursday evenings on BBC TV in the UK.
And I just got the bug for science, particularly physics.
But eventually I realized that physics was, all they were doing was building models of the universe.
They didn't really understand how the universe really worked.
And I was kind of, I wasn't mature enough at that age, like age 19 or so, to realize that, you know, it's all we do all the time.
All science is really about building models of one kind or another. So then I diverted into mathematics and my first degree and my PhD are actually both in mathematics.
And then, you know, I did some postdocs and I got a job in industry working for Hewlett-Packard Laboratories.
And there I kind of came all the way back around again towards computer science.
And so, you know, it's been a long, long journey through starting with computing and computers and kind of hobby computing into physics, then into mathematics.
And finally, now I'm a professor of computer science for the first time, actually.
So, you know, since 2019, since I moved to Zurich, I suddenly became a qualified computer scientist somehow.
So it's been a long, strange kind of route.
That's an amazing roundabout story. The funniest part, sorry, just to jump in, is that like, what do I do now in cryptography, particularly when we're doing, say, a formal security analysis of some cryptographic system, is we're building models, right?
We talk about oracles for encryption and decryption and so on.
And so, you know, I'm back to doing what I was doing 30 years ago in physics class, building models of a slightly different universe, maybe.
So let's go back and dive a little bit deeper into some of these things.
So what was the data format for these? I'm not familiar with this. I think it was just like some 8-bit format where you just could basically read, I think it was like one bit of parity or something, and then seven bits of data, and you could directly get these bytes off of the magnetic tape.
You know, it was like a cassette tape.
Probably a lot of our listeners are too young to remember cassette tapes, but that's what you use to store your programs on.
So you had to load, this is before, you know, you could buy hard disks for these machines, but only kind of later on, and they were incredibly expensive.
They cost, you know, about what a small house would cost today to buy a hard disk, a floppy disk, sorry, for one of these machines.
So tapes is what we used. So I had all these tapes lying around my bedroom.
Yeah. And what sort of game, were these games mostly, or were they productivity software?
It was all about games. It was all about, you know, things like Space Invaders or, yeah, there were lots of kind of Atari clones that we used to get.
These were all 8-bit computers. The CPU was a 6502 running at two megahertz, which seemed incredibly fast in about 1982, 83.
So computers now are a thousand times faster and, you know, have bazillions times more memory and storage and everything attached to them.
So it was a different programming paradigm.
So when I started off, I remember we took this computer home and as we were walking out of the shop, my dad said, oh, we'll get one game for the computer.
And then I had to explain to him that we only had the model A computer, not the model B.
And so none of the games would run on it. It only had 16K of RAM.
So it was a pretty challenging environment to do anything. And you were always running out of memory and finding hacks and stealing pages from the operating system that weren't particularly stable, but then maybe they weren't being used for anything either.
You could get yourself an extra 250 bytes of memory, which was like an enormous amount of memory to do things in.
So it's a lot of fun.
I never did any kind of commercial game development or anything like that.
I wasn't really good enough to do that. I did write software for moving sprites around the screen, moving them as smoothly as possible, you know, by interacting with the BDU controller and making your movements of your pixels coincide with the refreshes of the screen, all that kind of syncing stuff.
But I didn't really go much further than that, whereas other people did.
I knew people a couple years older than me who ended up writing games commercially and making quite a bit of money, quite a bit of pocket money, let's say, whereas I was just ripping off their games and selling copies of them in the playground.
So how did these old games, not to go too much into this, but you mentioned how there was some sort of copy protection.
Right. How did this copy protection work? It seemed to be obscurity in some broad sense.
There wasn't anything... Yeah, so there was always like the operating system functions for, you know, saving and loading files, which most games used, most software houses were using those.
But I think some software houses had written their own bespoke tape formats and they wrote the data in such a way that it was hard to just make a direct tape-to-tape copy without degrading the signal quality.
So somehow they were maybe just on the edge of what could be read from the tape, so that when you made the first copy, the quality would degrade sufficiently.
There was also an OS-enforced access control mechanism called locking on this particular operating system, the BBC ACORN, whatever it's called, ACORN operating system.
This is like, this is 35 years ago, 40 years ago, I can barely remember.
But there was like some specific way you could bypass that by making the right operating system call at the right moment.
So what I did was like, I wrote this software that at the kind of very lowest layer recorded the bytes and the timing of those bytes directly off of the tape.
And so just kind of recording at the kind of the lowest possible layer, just above the physical layer, what the bits were and what the timing of those bits were, and then wrote that back again when it was time to write.
So, you know, you could, thereby you were building basically a tape copier that ignored whatever bespoke format the developers were using and just kind of went below it and said, okay, you know, I'll just do it from here.
So I don't know whether that was influential in making me think like, you know, like breaking things.
I don't know. I never really thought about that before.
Maybe it did. Maybe I've always been into hacking around and stuff and seeing how it works.
Like I'm one of those guys when like, if I buy a new gadget, I never read the manual.
I just play with the buttons until something happens or it breaks.
And I think a lot of people are like that probably.
Well, this idea of having a signal that represents data that is, I guess, fragile or close to meaningful, but very close to not meaningful, it kind of reminds me a little bit of the image that's behind you in this picture.
So maybe tell me a little bit about your Zoom background.
Oh, sure. So this is a representation of the RC4 stream cipher.
So RC4 was this incredibly beautiful and popular algorithm design by Ron Rivest in the early 1990s.
And RC4 was actually a trade secret of the RSA company for a while until eventually it was reverse engineered from some software.
And what's beautiful about the RC4 algorithm is you can run it and you can write it in something like four or five lines of C.
It's a very, very simple algorithm.
And so it became really popular with software developers in the 90s and the 2000s.
If you had to hack up an encryption system in a hurry, you might, you'd find some code for RC4 lying around and you would just use it.
Such that it then ended up being a very popular option in TLS, which I guess most of our viewers will know is like the secure protocol for web browsing and many, many other things.
So in about 2012, 2013, RC4 was being used to protect about 50% of all of the TLS traffic on the Internet at that point.
The alternative at that point was this thing called CBC mode, which was using a block cipher.
So whereas RC4 is a stream cipher.
And what this picture represents is the fact that actually it's not a very good stream cipher when you look into it carefully, because each of those kind of spikes in the background represent some kind of bias in the outputs of the algorithm.
And these biases are kind of magnified in this picture, but they're actually, they're big enough to enable you to do attacks against the stream cipher in certain circumstances when it's used in TLS.
So this kind of picture, and then really represents a research effort to break RC4 in TLS that we published in 2013.
And this led to eventually over the next two or three years after that, to RC4 really being abandoned as an encryption option in the TLS system.
Yeah. I remember when I was at Cloudflare at the time and that paper came out and we had a kind of a panic moment because it was our preferred cipher for a lot of our customers.
And I think we were among the first to disable RC4 completely on the strength of this research.
So thank you so much for doing this. Oh, thank you for writing the blog posts about it because actually those blog posts really helped me to understand some of the industrial context for why it was going to be difficult to switch off RC4.
So those blog posts wrote about old feature phones that you were trying to support that couldn't run block ciphers efficiently or didn't have AES in hardware or whatever.
And so this is maybe something we'll go on to talk about a little bit more, but we sort of did that research almost in vacuo without really being aware of who all the industry players were at that point.
Even in 2013, I was still pretty wet behind the ears when it came to talking to the industry about crypto issues.
So those were great explainers, I think, those blog posts.
Yeah, so RC4 going away put us in this kind of awkward position, right, where you have to support this other mode that is known to have a certain type of attack and then this RC4 attack was a certain type of attack and you're left with the situation where TLS 1.0, which was the dominant version of TLS at the time, had no completely 100% safe options.
And even further, you had older machines out there that were running versions of Windows that did not support AES, which is the advanced encryption standard, the one that went through a whole contest and was chosen.
It only supported DES, which is of a much older vintage, let's say, triple DES.
It was pretty slow actually compared to AES or RC4. Yeah, it's very slow and then the key size was very small.
So there was, I guess after that, an attack against triple DES and you're left in the situation where what do you do with these old computers and what do you do with these folks who are trying to connect to the Internet and they want to get secure content and they want to use HTTPS and everybody on the web wants you to use more secure encrypted things but you're left with a situation where all the choices are bad in one situation or another.
Was this something that you were anticipating when you were doing this work or was this kind of a surprise?
Yeah, I don't think I had enough awareness at that point in time to realize the length of the long tail of old code that's out there and old phones, old hardware.
I wasn't aware, for example, that lots of medical equipment, for example, just you can't upgrade these things in the field easily and maybe there's system critical components that are protected by bespoke firewalls that can never be touched and must be kept running because entire banks rely on them, for example.
So I think a lot of that stuff I started to really become aware of through this work, which is for me one of the big plus points of it actually.
It was a door opener for me to be able to talk to lots of different people in industry and understand what kind of problems they faced.
So no, we didn't really think that through.
At the time, we knew that TLS 1.2 existed as a standard. It had been out since 2008, I think it was published in an RFC and we were kind of vaguely aware that it wasn't really supported yet in most of the mainstream implementations.
But from my sort of ivory tower at that point, it was just my expectation that people should just make the switch.
Why can't they just make the switch naively? I think I understand a little bit better now.
But I think that work and lots of people were involved in that too.
So there was a lot of really good follow-up work on RC4. I think one of the key papers was by Matty Van Hoof.
He wrote this paper called All Your Biases Belong to Us, which is like a play on all your bases, the meme, which really showed the direction of travel for RC4 and showed that these attacks were only going to get stronger as time went by.
Actually, the big dirty secret from the academic perspective is that at that point, we ran out of ideas collectively.
We didn't know how to make these attacks any stronger.
So we created the illusion of momentum, which encouraged the industry to move forward.
But that was it. That was the end.
We didn't have more to say on the matter after that point. It's also true that you get less and less academic credit for coming up with the next delta improvement on an existing attack.
It's always the first paper that gets you the kudos and gets you the credit, which is a fundamental issue about alignment of reward systems between academia and between industry.
They're very differently aligned.
So to go back to your original question, at the time, I just assumed that everybody would roll over to TLS 1.2, where you have things like ASGCM.
So you have proper, hardcore, authenticated encryption modes that you can switch to.
And indeed, that did happen, but it took a couple of years.
I've written papers that have these graphs in them that show the amount of RC4 traffic over time.
And there's a point where IETF deprecates RC4, nothing very much happens.
And then there's a point where the major browser vendors actually switch off support for RC4 in their browsers, and then the traffic really drops.
And at that point, sort of mid-2015, they were able to switch over then to using TLS 1.2, which has better encryption options.
So how did you decide to get into attacking something like this?
RC4 is ubiquitous and very well known, and it's so small, it can't be perfect.
But what was the inspiration for tackling that? Where were you in your career?
What sort of things were you looking at before this particular piece of work?
That's a really interesting question. So I can tell you exactly the circumstances under which this happened.
We had done this work on CBC mode in TLS, which led to this thing called the Lucky 13 attack.
That was a paper I wrote with Nadem Alfardan, and that was a kind of an advanced padding oracle attack that showed that the other major option also was looking a bit shaky.
As you mentioned before, this was like, there are no good options left.
So we had done that, and that work was in submission and had been disclosed.
And I was at Real World Crypto in January 2013 in Stanford.
That was the second edition of the Real World Crypto Conference.
And Eric Rescorla was there from Mozilla, and Eric is kind of a major figure in the area of TLS generally, but Internet security as well.
And he was giving a talk about this, and he said something like, yeah, we still have RC4 in that spine.
And I was like, meh. And actually, that first thing he said to me maybe a couple of months before, I'm trying to think exactly what, maybe early 2012, mid-2012, he'd said, well, now that you've done this thing on CBC, what about RC4?
And I remember he was at the conference, and we turned to each other and we looked at each other, or we sent each other some email in the middle of Rescorla's talk.
So it's actually Eric's fault, right? It's Eric Rescorla's fault that we did this.
He provoked us by saying, ah, but we have RC4. And we just did these quick back-of-the-envelope probability calculations and figured out, well, the biases were this big, so you would need this many ciphertexts.
And then we realized at that point that it wasn't going to be that secure after all.
So then we just put together a team over the next few weeks and months to try to develop this idea.
And actually, I should say as well, at the same time, simultaneously, some Japanese scientists led by Izobi had also done some really nice work looking at these biases in RC4.
But I think our work was more comprehensive and actually demonstrated the vulnerability in the TLS context more precisely than their work did.
So there was a number of ideas in the air at this time. So it just seemed like the logical thing to do.
Well, people are switching to RC4 because they're worried about things like the attack, and they're worried about this coming Lucky 13 attack.
So let's reduce the number of options by one more.
No, let's really understand the security of this entire system and look at the next option and see what we can do there.
So before we move on to TLS, I'm sorry, what were you going to say? I was just going to say the attack actually turns out to be really simple in a conceptual way.
You just need lots and lots and lots of encryptions of the same plain text over and over again.
And then these tiny biases that you get in the RC4 key stream kind of come into play.
And you can see the plain text through the ciphertext if you have enough encryptions of the same plain text.
And TLS actually allows you to kind of get into that situation because you have like, say, session cookies being sent over and over again to a website.
And you can even manufacture the kind of ciphertext that you need by having JavaScript running in the browser.
And this was like a technique that we borrowed from the hacker community from Duong and Rizzo who'd worked on the beast and crime attack.
So there was actually a really nice interplay there between what they had done and us borrowing their techniques and reusing them.
So learning from these different communities.
But in the end, the attack is really kind of trivial. It's like five lines of equations in a paper is everything you need to do.
It's like almost the most brain.
I mean, I think a lot of professional cryptanalysts, the kind of guys who analyze block ciphers and publish in conferences like Fast Software Encryption or Asia Crypt or whatever, like looking at it saying, what's all the fuss about?
Your attack is trivial. And in a sense, they're right, but none of them did it.
The point is to do it, right? And to demonstrate that it has an impact, that it makes a difference for something like TLS.
Yeah, that's amazing. It's all the different pieces coming together.
It's how cryptography is used in the real world, plus hacker groups.
It was down in Argentina, I think, when the attack was happening.
Yeah, Juliano Rizzo and Tai Duong.
I mean, Tai works for Google, for example. So he's gone legit, right?
Yeah. Okay, so with respect to TLS, this leaves one cipher left. What's the chance that AES-GCM is going to be a lot more insecure than we currently think it is?
Well, I mean, there have been some issues. There's still cha-cha-poly. There's still cha-cha-poly as well.
So there's a kind of a backup algorithm, but I guess that you'll know better than me, but I mean, AES on most platforms, now you have specific CPU instructions to make it go faster.
Whereas I don't think cha-cha has that on any CPUs.
So I guess, but then cha-cha has this very wide block, so maybe you win.
But I think it's hard to beat the performance of AES if you have the hardware support.
So it's like an unfair comparison, actually. There have been some problems in AES-GCM.
So repeated nonces lead to real security disasters. So it's actually quite a fragile algorithm.
And again, this was known for a long, long time, but the kind of exploitation of it wasn't really well understood, I suppose, or the potential for exploitation.
There was one paper called Nonce Disrespecting Adversaries that was published back in 2016, I think by, I think Hanno Bock was the lead author and others too, where they actually went out and did a scan of the Internet and found that some servers were indeed, or clients were repeating their GCM nonces, which is like, for AES-GCM is a disaster, right?
It enables you to recover the integrity key, and it also enables you to recover some plain text and some key streams and do packet forgeries.
So it's kind of, it's pretty bad. But it seems like as long as you avoid that vulnerability, and as long as you avoid kind of the obvious, well, by now obvious side channel opportunities against AES, so like AES is maybe quite difficult to implement in a constant time manner on a standard CPU without the hardware instructions.
As long as you avoid those things, it seems pretty robust.
There's not, I mean, there's been cryptanalysis of AES as a block cipher over the years, but nothing really significant that really dents our confidence.
I think all those attacks have really increased our confidence.
And then AES-GCM, there've been some issues with security proofs for AES-GCM over the years.
So the original proof turned out not to be completely correct, and it's been corrected.
And we know, I think over time we've developed quite a lot of confidence in AES-GCM as a mode, combining the confidentiality and integrity into one algorithm.
So I think AES-GCM will be with us for a long time. We can do better.
There are faster algorithms like OCD, for example. But once an algorithm dominates its market, it becomes quite difficult to displace unless it has some kind of catastrophic security issue.
Yeah, well, AES-GCM is, yeah, it's what we have now.
And it seems it's relatively fast and relatively secure, but it does have its flaws.
Don't truncate your GCM MAC fields, otherwise you might get into trouble.
Yeah, there's a lot of really iffy little ways of implementing cryptography correctly.
So this kind of leads into something you hinted, but maybe we can tackle it from a specific angle of this, is that industry adopts cryptography that's often developed in conjunction with academia or from academia writ large.
And sometimes these lessons or these specific security properties are missed or forgotten when moving from the context of a paper to the context of an implementation.
What sort of things are in place to... How come every time industry comes along and grabs a shiny new algorithm, they tend to hit these rough corners?
I don't want to really apportion blame one way or the other, but I think it's actually a two-way street, this interaction between academia and industry.
So I've done quite a lot of work as a kind of cryptographic consultant, helping industry adopt and get ready for production some of these cryptographic ideas that come up in papers.
And sometimes the developers I work with have the craziest questions and you're like, well, how can you possibly think that?
And then you sort of realize that cryptography is this very, very specialist thing.
It's equally art as science, even today.
And there are all these little kind of wrinkly, crinkly corners in all of these algorithms that you need to be very careful, sharp edges that you have to be careful of.
So I sort of see sometimes developers asking very, very basic questions, but on the other side, as academics, we do a terrible job in making our work accessible.
We write for each other and we don't write for that more general audience of say, a general software developer who's maybe not, has some exposure to cryptography in an undergrad class, but it's not a crypto developer, just a general software developer.
I mean, I've done consulting jobs where all I've had to do is explain how to choose an initialization vector and something as simple as that, because these things are mysterious and can I use a counter?
Wouldn't it be easier to use a counter for my ID, et cetera, et cetera.
And then you have to kind of explain why that's not a good idea.
So I think as a community of academic cryptographers, we could do a lot better job in explaining or helping to make our stuff more developer friendly, especially if you're not just doing the kind of deep theoretical crypto, but you're actually trying to do something that's more directed towards applications in the shorter term.
But I think we're making progress though. So I'm hoping that the future will be better than the past has been in that regard.
So for example, with things like real world crypto, the symposium that we set up, I mentioned the 2013 edition earlier, and it's now still going strong in 2020.
I'm not sure what will happen in 2021.
We're supposed to be in Amsterdam and I hope it will go ahead as a physical event, but even if it doesn't, we'll do something online.
But that's created a forum where lots of academic and industrial cryptographers are coming together to discuss crypto and give talks and learn from each other.
And it's becoming the case that sometimes you can't tell the difference between the academic cryptographer and the industry cryptographer.
And I think that's a really positive development.
There's a lot more interplay. And I see now a lot of large companies actually hiring PhD level cryptographers to do crypto development and to do crypto engineering for them.
So Cloudflare is a great example. The crypto team there is constantly growing, it seems, hiring lots of really good people out of really good PhD programs.
AWS is busy hiring people. Of course, Google, IBM, et cetera, have always done that.
So it seems to me like there's a better understanding that if you're a large tech company, then crypto probably is part of your core business and you better get it right.
And then B, the right way to do that is to hire some bloody good cryptographers.
Not every company needs them, and most companies don't need very many, but those that do, really do.
And I think they're gradually recognizing that fact.
And I think that's really, really positive.
So I think these barriers are breaking down and I think in future things will go much better.
Having said all of that though, having said all of that, sorry, we're trying to do much more complicated things with cryptography now than we ever did in the past.
So once upon a time it was about secure communications, but now we're doing these complex zero -knowledge proofs for cryptocurrencies or we're doing multi-party computation with really quite complex crypto libraries being used there.
So I do slightly worry that maybe as we increase our ambition level in terms of the complexity of the crypto that we're trying to deploy in real applications, so we will maybe reinvent entire classes of attack or introduce entirely new types of attack that we didn't think about before.
So let's see. Yeah, it's interesting.
I've seen so much new cryptography based on 2000s era discoveries rather than discovered in the 90s and 80s and 70s.
And a lot of groups who mentioned cryptocurrencies, blockchain are kind of grabbing these new cryptographic constructions and running full speed ahead.
Yeah. With real money involved, or at least something that's semi-fungible and could be one day converted into real money.
I find that pretty scary. So I wrote a paper, when did we release that?
I guess earlier this year or late last year with Dan Bonet and Florian Tramer from Stanford, where we looked at the anonymity guarantees of Zcash and Monero, which are two leading anonymous cryptocurrencies.
And we found pretty bad flaws in the way that they were doing quite basic things like public key decryption, which because of the way they were doing it led to anonymity breaks against those cryptocurrencies.
And so when you grab these different primitives and throw them together to make a system, security does not compose.
This is something we've learned very much the hard way with say, MAC then encrypts in TLS.
And indeed it's the same issues, not exactly the same issues, but the same class of issues arising from lack of composability guarantees also come into play in things like cryptocurrencies.
So I expect to see a lot of that. Maybe the base kind of algorithms will be okay.
The primitives will be okay, but the composition of them will be okay.
So all these compositions of different algorithms, yeah, you mentioned MAC then encrypt, those are two of the very basic, the very basic things you can do with cryptography is make sure that there's integrity on a message guaranteed by symmetric key or that there's confidentiality.
And now we're talking about composing things like zero knowledge proofs and public keys and schemes, all kinds of things.
Yeah, absolutely. So in the future, how do you see these attacks?
You mentioned the ability to break anonymity guarantees, which is actually not something that traditional cryptography provides in any sort of strong sense.
Confidentiality and anonymity are two very different things.
So how would you consider automating these attacks that you found against Zcash or Monero, considering that there's an exponentially growing number of different projects that are picking and choosing different cryptographic pieces and throwing them together?
Yeah, that's a great question.
I'm not really the right person to ask about automation because I spent too long thinking about mathematics, which is like a pursuit that you follow with your brain rather than building a system to do things automatically.
But there are people in our field who are thinking in that way about cryptanalysis.
So I actually spoke to one of them just this afternoon, Uri Somorovsky, who's at Paderborn University in Germany, who wrote this with his colleagues, has written this tool called TLS Attacker, which enables you to basically very rapidly prototype different attacks against systems like TLS and DTLS, and then try those attacks out against lots of different implementations very, very quickly.
Because with TLS, there's like, I don't know, a dozen or 20 different, maybe even more different implementations that are used, and you want to find out which ones are vulnerable to this new attack idea you have and which ones are not.
So that's an example where automation has been brought into play, but it's still human-guided automation.
It's not like you press a button on some AI system and it says, yes, here's your next Usenix paper.
There's a lot of work and a lot of skills still involved in using those tools.
There's also a nice paper, I think it's going to appear at Usenix Security in a couple of months' time.
One of the authors is Matt Green from Johns Hopkins University, where they looked at automating kind of padding oracle attacks, which is like a class of attack against symmetric encryption schemes.
And they had some success using constraint satisfaction as a tool to guide their attack towards finding optimal attacks that consume the least number of cybertexts or the smallest amount of time to conduct the attack.
So there are hints of this kind of automated approach starting to appear.
Also on the flip side, automated approaches or machine -assisted approaches to proving security systems is something that's coming up a lot.
Again, I'm the wrong guy to ask. I'm too old and stuck in the mud to really be able to adapt to that stuff.
But there's really great work going on in our field in that domain as well.
And this comes out of, maybe comes less out of the crypto community and more out of the language security community.
People who are used to thinking about type-safe languages and annotating code with complex security -related types and then running compilers or running proof of systems on the code and seeing what they can get.
Famously, though, those tools are really difficult for mortals to use.
They tend to be used by the people who first invented them to write the next paper or to develop the tool a little bit further.
And traditionally, they've been really quite difficult for ordinary users to pick up and start doing useful things with.
It was like a very steep learning curve on these tools.
So there's a long way to go to get these tools to the point where an everyday standard software developer could use them to say something meaningful about the code that they're writing.
Then, I guess, also there's the rise of things like Rust, for example, as a programming language, which I don't think is perfect.
I'm not an expert, but it sort of forces you to think about security from the get-go instead of worrying about your data structures afterwards.
You have to really design the security in mind from the beginning.
Yeah. Yeah, it's challenging, right? Because it seems like here's all this code in the world and look at everything that results on the timing of doing a public key operation.
And how do you generalize that? How do you make that something that anyone can figure out?
Oftentimes, we have specific challenges here inside of Cloudflare where somebody needs to use cryptography for something, but they have an odd set of requirements where it's like, well, I need this to look random, or I need this to have XYZ property, or I want to do TLS, but I want it to be one way only.
And so there's a lot of these protocols and different constructions of cryptography that get developed.
I can imagine projecting outwards here at Cloudflare and previous companies that I've worked with that these are all throughout the industry and not necessarily using standards.
But if it's a niche enough project, you can't use a standard for it.
So you end up with all of this cryptography that people don't know to look to attack because it's proprietary in some way or another.
Yeah. There's a lot of stuff that's out there that's hidden.
So a lot of what academics do is really looking at open source libraries and find the next bug in OpenSSL, elliptic curve Diffie-Hellman, something, something, because that's what you can look at.
But it's probably a real mess behind closed doors. It's really hard to see what's going on there.
I did hear a great story. I think I can tell the story. So Google in Zurich has a very big presence and they have a big security testing team.
And one of the nice things about Google, I guess, is they have a single source tree for the entire...
All Google code is in one repository, apparently.
And so what they're able to do is they can data mine on that repository. They can see when people are using unsafe crypto constructions, and they have a system where that fires off an alert, maybe like an automated email that says, a software engineer in Mountain View is trying to use MD5.
You may wish to contact them and tell them not to do it.
And this is great. This is one way forward for large organizations that have that single view, that single view of control over their code.
But I suspect many, many organizations don't have that much control over what's happening across their entire code base.
It's probably very, very, very fragmented.
There's another famous story about Huawei, the Chinese telecoms company.
And this is in a public report published by the UK government who does inspection on Huawei products in a special facility somewhere in the Oxfordshire countryside.
And every year they publish a report saying what have they found.
And the report the year before last found multiple, like dozens of versions of OpenSSL being used across Huawei's products that they were looking at the source code.
So there's an example almost at the opposite extreme where maybe there's room for improvement in the software development processes inside some companies.
Yeah, cryptography libraries are often just picked up and plugged in and off you go.
Yeah, dependency management is a really challenging thing across companies because...
So I did start thinking recently about trying to design better crypto libraries.
I haven't got very far yet, but I started by thinking about primality testing and thinking of primality testing as like a very low level basic operation that we do a lot in cryptography.
Maybe we'll do it less when we get to post-quantum algorithms because you don't have so many primes lying around or the primes are fixed.
But like in the RSA setting or in standard discrete log setting, primality testing is still pretty important.
And when we went to look at what all the crypto libraries were doing for primality testing, it's like very classical, well mathematically understood operation.
It was a disaster. It was a disaster.
And again, it's public. Apple were actually doing it wrongly, their primality testing.
So they were choosing certain values in the Miller-Rabin primality test that should have been random.
They were choosing them in a fixed way, which meant you could come up with numbers that the primality test would tell you this is definitely a prime when actually the number was a composite number.
And that could have big implications in certain circumstances for things like backdooring crypto or inserting bad parameters into crypto systems.
And in Apple, we're not alone. Almost every library we looked at had some kind of problem.
And part of it was about API. So part of it was about developers being asked to use a complex API with many different parameters instead of like a very robust but simple API that would be secure across like a really broad range of use cases.
So in the end, what we ended up doing was designing a new API for the OpenSSL primality test.
And this is joint work with Jake Massimo that will appear at CCS this year.
And what's really cool about that is that OpenSSL adopted all of our recommendations for how to design the API.
And so for OpenSSL 3.0, the new release that's coming out in the next few months, they're changing the API for primality testing across the entirety of OpenSSL.
It's the first time they've really significantly changed it in 20 years.
So the code had become like really ossified.
And so I'm really excited about that, actually, as like a different way of thinking about cryptography.
Don't think about how to design a better AES, but think about how to design cryptography that's more safely consumable by developers who maybe only have limited exposure to crypto.
Yeah, the APIs and the API surface really determine how developers interact with a library.
So that's a fascinating approach and one that I haven't seen too often in academic cryptography.
One that I feel would be very effective actually going forward is how do we define simple APIs that are misuse-resistant?
So I guess, why haven't we seen this type of work before?
And what kind of led you to really cross over from cryptography to this project, which seems almost like developer experience in a way or system security?
I think it's part of my philosophy for research, which is like, I want to make a difference in the world.
And you only have so many years in your career before you run out of steam or they make you head of department.
And so you better get something done while you can.
And this seemed like a good way to actually make a concrete contribution that would improve the state of real-world cryptographic software.
And I'm kind of at a career stage now where I can do what I want without worrying about, is there a market?
Is there an academic market for this? Will other academics find this interesting?
And I just don't, I sort of care less than I used to about that kind of thing.
And that's not possible for everybody, right? People need to get tenure and they have to impress committees and all kinds of things.
But I got to the point where somehow I don't care as much anymore. So I just do whatever is interesting.
And this just seemed to be a something that was kind of within reach based on what we had previously done.
Looking at primality testing, it seemed like the natural next step to pursue.
And I worked with this really great PhD student, Jake Massimo, who actually is joining or has joined AWS already as a crypto engineer.
And we just figured it all out. And it's really, really interesting because we submitted it to a couple of conferences and it got rejected with very, very bad reviews.
Like there's no scientific contribution here. You haven't designed a new primality test.
And like, yeah, you're right. That's not the point of this work.
This point is, we've been designing primality tests for years, but people are still getting primality testing wrong in practice.
So designing another primality test is not going to improve that situation.
We need to ask a different question.
And so really it just kind of comes from an accumulation of experience and kind of developing a devil may care attitude to the kind of problems that I work on and that got me thinking about this.
Whether it's generalizable, I'm not sure.
So, I mean, we've had say authenticated encryption as a primitive for a long time, thanks to people like Phil Roggewey, who really was thinking about what's the right API for symmetric encryption a long time ago, 15 years ago now.
And now we see authenticated encryption everywhere. You can also say that say X25519 or curve25519 is about providing like a very simple interface to a Diffie-Hellman functionality that tries to take away as much as possible the complexity for developers.
So other people have gone down this route before a little bit, but trying to turn it into like a systematic research activity, I'm not quite sure how to do that yet, because I'll admit it does take me quite far outside of my comfort zone, it's not, I'm not a professional software engineer.
And I don't, you know, I don't have real world experience in API design, for example.
So maybe it's a question of bringing together the right group of people to do this kind of thing, rather than trying to do it alone, actually get some professional software developers and talk to them.
Yeah, it seems like this work is extremely valuable, but you have to build up this armor of proving yourself to the academic world.
Yeah, yeah. And you have to become impervious to rejection, right?
I mean, I mean, that actually is essential anyway, in academia long, long before you get to my kind of gray haired status where you don't care anymore.
But as you're coming up, you have to really learn to deal with rejection and people not understanding your work the first time or the first two times, or the first three times.
And eventually, you know, people people get on side with you and understand what you're doing.
And actually, I think I think that puts a lot of people off academic careers who would otherwise be really good at it.
They just don't like the kind of adversarial nature of the review process.
I think that would be something nice to try to change as well.
But it seems very, very difficult because of the way all the incentives are aligned.
So maybe that's something for me to try to do in my last few years before I retire.
I think you have a lot more years than that, Kenny.
Yeah, well, I think real world crypto, as you mentioned, is a good venue that allows a crossover between the two, academic cryptography and real world cryptography.
There might be some place to expand that somehow in a more formal way. And so given this, I guess, what do you think the important things that researchers in industry do differently than researchers in academia?
And what kind of motivates the difference?
Okay. I mean, obviously, timescales is one one aspect, and then motivations is another.
So, you know, there's a product release cycle, you want to add a feature to your browser, that does something fun with crypto, like privacy pass with Cloudflare, for example, I would think of.
And you can say more about this than me, of course, but, you know, you've got a feature, you want to get it shipped, right.
And so you might, in the end, make some compromises, or there might be an idea that's like, you don't really have time to go and fully explore where you might find a better solution.
But you have to be pragmatic and say, okay, we've invested this many engineer months in developing this solution, let's stick with that and go with that.
So there's like a lot more, I guess there's a lot more pragmatism.
And it's about academics also have kind of product releases, because there are conference deadlines, which come around on a pretty regular cycle.
So you know, typically, we plan for, okay, we want to get this piece of work finished in time for this deadline.
But you can always slip and you can always go to the next deadline.
And then maybe the student graduates three months later, or six months later or something.
And there's also, I guess, in most academia, why a lot of us are there is because of the intellectual freedom that gives us.
So if I'm interested in x, I can go and read about x, and I can spend, it's my personal responsibility to spend as much time as I can find if I want to read about that topic and learn about it and become an expert and then start doing research on it.
And I guess those opportunities are less available in industry, they're not not completely unavailable.
You know, there are still companies where you have a certain amount of your time notionally given over to doing research or thinking about new topics, you know, skunkworks projects, that kind of thing.
But academics certainly have more time to do that.
And then I guess the the kind of the reward mechanisms and the incentive mechanisms are quite different in the two in the two places in the two worlds.
So you know, what, why, how do how do engineers get rewarded in large software organizations?
Is it for shipping features? Or less? Or is it something else?
You tell me. Well, it depends. It depends on whether you have different structures inside the organization that incentivize different things.
So there's, there are different theories about, you know, how companies develop and how companies innovate, and sometimes having different incentives and different structures in terms of what's important for people can result in, you know, absolutely brand new ideas coming from left field.
And there's this idea that we talked about a lot here, from Clayton Christensen about the innovators dilemma, where you, you have sort of the main revenue generation part of your company is potentially adversarial to something new that's coming out, which might kind of take a piece of their pie.
So you really do have to give people freedom, create structures that allow folks to, to really focus on on the new parts of what the business could be in the future.
But it really does have to, you know, tie back down to the business.
Right. When I worked at HP back in the 90s and early 2000s, it was at that point, a $50 billion a year company in terms of sales.
And I was working in the research labs.
And unless you could somehow pitch an idea that was going to turn into a billion dollar business line, then it was hard to get attention from senior executives, because they, that's the kind of increments that they were thinking.
And I think that's another aspect of the inventors dilemma, right, which is why a lot of companies grow through acquisition rather than through internal new technology development for themselves.
Absolutely. So, so Kenny, you have recently moved, you mentioned ETH Zurich.
And this is your first computer science posting.
So what, how's your move to Zurich been? What, what are you planning to work on now that you're at this new, new organization, which is a top notch research university?
Thank you. Yeah. I mean, the transition has been has been great.
It takes time to learn a new system, to figure out what the rules are, what you can and can't do, and how hard you can push the envelope before people start complaining about, no, you can't, we don't do things that way here.
Learning, learning the ropes takes a little while, but I've been there for a year now.
So I started April the 1st, 2019. Not clear whether the April Fool's joke was on, was on ETH or on me.
We'll find out over time. But now I've, I've managed to establish a group there.
I've got several really talented postdocs and PhD students working with me now.
So the machine is up and running. And that really has cost about, maybe about a year just to get everything up and running.
I just finished teaching my first course at ETH as well on applied cryptography.
It was interrupted somewhat by the coronavirus epidemic.
So we, we had to switch to Zoom teaching.
And we'll see what effect that has had on the, on the student performance.
We're marking the exam right now. So we will see whether the students learn anything.
I do hope so. We're going to find out pretty quickly. So what am I going to, what am I going to be working on?
So I, a couple of years ago, I started thinking about encrypted databases.
So I did a lot of work on TLS data in transit, but then I got interested in this whole area of, well, how do we encrypt data at rest and how do we make it searchable?
How can you do, you know, operations on encrypted data?
You know, in principle we have fully homomorphic encryption, so it's a solved problem in theory.
In practice, it's not because we have massive scales and, you know, we want to minimize the amount of computation we're doing and the amount of data that we're moving around.
And so there's a lot of room for bespoke solutions that, that give you a good trade -off between security and performance.
And I spent the first few years writing attack papers, trying to break everybody else's schemes as a way of kind of figuring out what's hard about this problem.
So a lot of, a lot of these schemes that have been developed, they have some kind of leakage, which is identifiable.
You can write down, you know, in a very kind of mathematically precise way, how much information these schemes leak.
For example, if you're doing, if you use deterministic encryption so that you can do quality comparisons, you can compare two encrypted items.
If the, if the encryption process is deterministic, then if the plaintext was the same, the ciphertext will be the same and vice versa.
So you could, you can do searches over encrypted data using, using deterministic encryption as a technique.
However, that clearly then leaks a lot, right?
And we've seen what happens if you use electronic codebook mode, ECB mode, which is the deterministic encryption scheme.
You see the penguin, right?
The penguin, right. The famous Linux penguin, Tux comes through, right?
So we know there's leakage there. And yet we started using those kinds of schemes for doing encrypted database techniques or database systems.
And, you know, people wrote attack papers on that.
And then, you know, kind of, that's like the zeroth generation of attack papers.
And now there's one or two generations beyond that.
But what I'm doing now is saying, okay, having learned what makes these schemes hard to design, let me now start to try to design my own schemes with, you know, minimizing leakage, reducing the leakage, and also trying to do the cryptanalysis afterwards to sort of show that there is still leakage, but that leakage is not exploitable in a meaningful way.
And that part's really hard because there's this concept of friendly cryptanalysis, where if you design a scheme, you're never incentivized to break it in the same way as if it's somebody else's scheme, right?
Because, you know, it's a psychological thing.
It's very hard to gear yourself up to break your own schemes. And so working with a team I put together at ETH, we've been designing new encrypted database schemes that enable keyword searches, and which we borrowed some ideas from kind of oblivious RAM and so on, but with trying to avoid all of the overheads that come from using ORAM to do something better.
And we got some very, very nice results out of that, quite preliminary, you know, some maybe not full scale databases yet, but we want to develop those ideas further.
So that seems like an exciting direction to go in over the next few years.
But I think I will continue to do whatever I find interesting, because as a strategy, it's paid off reasonably well over the last like 10-15 years.
And so that's what I'll keep doing. So what do you think the next 10 years is going to look like in terms of cryptography?
Yeah, you're getting to the hard questions now, Nick.
No, there's only three minutes left. I mean, just so real quick, I guess we're now kind of, we talked about this a little bit earlier, we're seeing much more complex crypto primitives being deployed in, say, electronic cash schemes, e -voting, MPC systems, zero knowledge proofs, and so on.
And I think that's like a trend that's only going to continue. So people have realized that crypto can be really, really useful.
It can solve problems that we can't really solve any other way.
But I guess we have to be very, very careful that we just don't create this entire minefield of new broken implementations.
And there'll be plenty of work to do to get the systems and to make sure that the theoretical proofs that we have actually match the reality of the systems that we build.
And as those systems get more complex, that gets harder and harder to do.
So taming the complexity is like a really interesting challenge. And that relates back to the things we talked about, composability, finding ways of guaranteeing that plugging together these different components doesn't undermine security and actually gives you the security that you expect.
And it seems to be very, very challenging and exciting area to work in.
So there's no shortage of things to work on, that's for sure.
And more and more people are coming into the field.
One of my aims is to stop all the talent going to work on machine learning. I think we have a chance of doing that though, because I think we are really living in exciting times for cryptography.
There's so much opportunity and such a much broader understanding now of the importance of cryptography in these systems.
So I think it's a good time to think about becoming a cryptographer. Excellent.
Well, thank you so much for your time, Kenny. And it was great to see you virtually and have a great rest of your day.
Everyone else off our TV on to the next show.
Bye.