🔒 Nigel Smart & John Graham-Cumming Fireside Chat
Presented by: John Graham-Cumming, Nigel Smart
Originally aired on May 19, 2022 @ 5:00 AM - 5:30 AM EDT
In this Cloudflare TV Data Privacy Day segment, John Graham-Cumming will host a fireside chat with Nigel Smart, Professor, imec-COSIC, KU Leuven.
English
Fireside Chat
Data Privacy Day
Transcript (Beta)
All right good morning or good afternoon wherever you are to the third chat I've been doing this morning for Data Privacy Day.
I am joined by Nigel Smart who is a professor at COSIC.
COSIC, what's COSIC Nigel? COSIC is a research group in the KU Leuven which is the main university in Belgium and we are probably the biggest cryptography research security group in the world that I know of in academia.
We have like about 80 odd people doing research and we do it all from very theoretical stuff through to things that people might see.
We broke into the Tesla key lock system so we can steal people's Teslas.
That's probably the most famous thing we've done.
Is that what you do? Yeah breaking the key lock to break into Teslas.
We're probably watching this encrypted because Zoom is now encrypted I've heard via AES encryption, the advanced encryption standard which everyone uses on the Internet for everything.
It was invented in this building.
The inventor is a few doors along in that direction and yeah so we do everything and we're kind of huge.
Huge cryptography brain box in Belgium. Brain boxes yes indeed.
There you go all right and also you have got this company called Unbound right?
Yeah so a few years ago I founded a company with Yehuda Lindo who was a professor at Bar-Ilan University.
I stayed in academia. I was in England at the time but now I've kind of joined the Brexitus to Belgium and Yehuda Lindo was an intro but he's now left university and he's now actually the CEO of the company.
All right and I thought we would talk about what Unbound does and also about the theoretical underpinnings of this stuff which is like a multi-party computation and what the hell that's got to do with privacy.
So you want to tell us what MPC is to start with?
Okay so there's a MPC multi-party computation. It's kind of in the title.
It's multiple parties doing a computation. Pretty simple. The example I always use is basically because I come from England and there's lots of very rich bankers in England and so all these bankers there they've just been working on the stock exchange and they've been given a bonus and they want to know who's going to pay for lunch.
Now clearly it's fair and equitable because bankers are always fair and equitable people.
The person who should pay for lunch is the person who's got the biggest bonus.
So you John you've been given a bonus of I don't know a hundred thousand dollars and I've been given, I'm a poor person so I've only been given a bonus of fifty thousand dollars.
So we want to work out who should pay for lunch and clearly it should be you but you don't want to reveal your bonus value of a hundred thousand dollars and I don't want to reveal to you that I've been so much less.
So all we want to do is learn the bit of information that John should pay for lunch without learning anything else.
So what we learn afterwards is that John has a bigger bonus than me and I have a lesser bonus than John but that's what we want to do and doing that's what's called a multi-party computation.
We are computing a function and then we get an answer at the end and the answer all that's revealed about the secret inputs is what could be deduced from the answer.
So that sounds pretty easy, well sometimes it sounds easy and it also sounds like magic so we kind of put it in historical context.
So this idea came from the 1980s, it's called the millionaires problem.
Andy Gow came up with this in the 1980s and he had a protocol in theory that could do this, that could compute which one of us should pay for lunch and it wasn't until maybe the mid-2000s that we could actually execute the protocol in an efficient manner.
So for example me and Yehuda, my co-founder, the first paper we wrote together was a paper which showed that you could actually execute this millionaires problem in a environment in which you assume the worst possible adversarial behavior from one of the parties.
So it's a cryptographic protocol where one of us is going to be cheating.
So suppose I'm the bad guy, I want to learn what the value is, your value is and therefore I can deviate from the protocol and cheat as much as I want and that in the late 2000s it took us about 10 minutes or so to compare two numbers.
Now that's like you know just like 15 years later we can actually compare two numbers, we don't even bother counting whether we can compare two numbers, it works very fast.
Let me go back to something, when you say compare two numbers, you're just you're literally talking about the scenario, the two bankers, the two bonuses, they both feed their bonus into this secret computation and 10 minutes later it comes out and it's like A is bigger than B.
That is what happened, that was the state of the art in the early 2000s.
Now we can run almost arbitrarily complex operations, there's been deployments and use cases all around the world.
You can think of this as another really cool example you might want to think about is electronic voting.
When I want to vote I actually put my vote into the system and we want the system to compute who's won the election but we don't want the system to learn how each individual has voted.
So this is again, this is a form of multi-party computation but that's quite easy, it's just adding things up.
A cool example is in Boston, there's what's called the Boston wage survey.
They wanted to see how much was the average pay of women compared to men in various companies in Boston and so they've done this for a number of years now and they do that via multi-party computation because what they're worried about, well what companies are worried about is companies don't want to reveal their payroll data obviously but you want the city to know what's the overall average things that are computed and they do this via a multi-party computation to actually get social benefit out to work out what's the difference in hourly pay between a male and a female in various, a large number of organizations in Boston.
Yeah there's been loads, I mean we can go on talking about applications and use cases as we go through, yeah.
So okay, so how?
I mean how does it work? You can't just come on and say you got magic and you're going to do magic.
We've got magic, it's magic, it really is magic, it's really really hard to explain.
It's kind, it really is mind-blowing when you kind of think about it is what you're doing.
Okay we should first say what it's not, right. Okay so there is a way of computing on secret data which is, which people might have heard of, which is SGX.
Now this is like an enclave inside your Intel computer where data is encrypted on the way in, it's decrypted inside the enclave and then it's encrypted on the way out and that really, one it's kind of cheating and two it's not that secure because the way Intel processors are built you can execute so -called side channel attacks on the SGX module itself.
So this is okay, let's just unpack this a bit.
So you're talking about a physical thing in a chip. A physical thing in a chip on the motherboard in a data center.
And you have a way of feeding in encrypted data and the chip knows how to decrypt it and nobody else does.
It can do some sort of computation and spit out an answer and say yes.
Okay great that's SGX.
So that's SGX but the problem with that is it's easy to use in that almost every microprocessor has an SGX thing on it these days.
So you send it into the cloud data center and therefore you can compute on data in the cloud without the cloud provider in theory being able to know what's going on.
The problem with this is though is that due to various things like all the attacks we've heard of like Spectre and Meltdown and all these things.
The design of microprocessors makes this very very hard to do in a secure manner and so you're susceptible.
You really don't get full-blown security.
So that's what it's not. So what multi -party computation is, is a way of what we do is we split the data into two.
So my input I kind of split into two.
So I take, so suppose my input was number three. I might then split it as seven minus four.
So I keep the value minus four. I give you the value seven.
So we shared the data. So now you've got a bit of the data but it means nothing.
It's just a random number. I've got a bit of the data which is a random number and now we can process between us by executing a protocol such that the data never comes back together but we're able to do computation.
So in essence the issue is if you split it like A plus B equals the secret that you're trying to keep secret then you can do linear functions for free and the complexity is actually doing multiplication.
Multiplication is expensive but to do multiplication we just exchange messages.
There's a protocol to do multiplication and so and if we can do addition and multiplication we can do everything because everything can be written as a sequence of additions and multiplications.
So of course if you did it like that it's very inefficient and you have to use all sorts of tricks but the basic philosophical premise is you kind of split the data, process the data linearly and then if you want to do a non-linear operation you have a special protocol to do the non-linear.
And so just in the example where like it was three and you put seven in and you kept minus four to yourself do you then get the output and combine minus four with it in some way and say oh now I know the correct answer.
Well no so as you as you engage them in the protocol it might be at the very end the answer could be 12 okay and I might hold the number seven and you might hold the number five so if I want the answer you send me the number five I add it onto my seven.
So in some sense every piece of data has itself a key which I keep and then I give you the other bit right together and then from your point of view the thing you hold is the key and I've got so we're kind of in a schizophrenic world where my bit of data is my key and you've got the text and your ciphertext is your key and my key is your ciphertext.
It's kind of weird yeah yep yep and we can scale this up to large numbers of parties we don't just have to have two people we go three four five six and there's some people run experiments with like a hundred parties but usually in commercial people are interested about two or three parties.
What were the sort of commercial uses so you mentioned some of the beginning what are commercial uses with two or three parties?
Okay so for example with so I talked about the Boston wage survey that's one other ones you can think of voting is kind of weird so we don't actually do voting via MPC because vote is a bit more complex but it's a nice example to see what we actually mean by computing on encrypted data but you could imagine you want to do some statistical analysis of between a number of firms so firms may have corporate information and they might want to work out what the return on investment is of a certain type of investment or get sector-wide statistical information out so a company in Estonia called Cybernetica has been doing surveys of companies to find sectorial information, high level statistical information given low level data.
It's a bit like the Boston wage thing because it's a survey in some sense you're combining but it's kind of doing it in a different application domain.
You can imagine every year there's a competition to do things to do with this applications in genomics and medicine.
You can imagine you've got two drug companies they've got data and they want to pool their resources to develop new drugs you know but they might have have experiments they want to not don't want to duplicate experiments but they don't want to actually give away that much information because it's commercially secret so you can get companies to come together and get more efficiency.
A great example probably the first commercial deployed example is an auction so in the past this was in the mid -2000s they did sugar beet selling in Denmark so it's a famous sugar beet example that you can't give a talk on MPC without talking about it.
Is that right? So in Denmark in the mid-2000s they had this issue of how do you work out what the price the clearing price is for the market for sugar beet and they did this with a three-party computation and it worked it was the first commercial deployment but now we're kind of looking at we're talking with various organizations of how you could actually do everything's an auction in some sense allocation of resources is an auction so you think of stock markets so we've done a lot of bit of work working with how you would make dark pools which do a lot of trading on stock markets how you'd make those secure because you've got an auction and you maybe don't trust the auctioneer you don't trust the stock market because the stock market could itself cheat I could do front money.
Another example commercial example which was pioneered by the company SAP they looked for a number of years at looking supply chain management you've got a number of people in the supply chain and you want to organize how much stuff goes on this boat compared to this other boat but your capacity you don't really want to reveal because if you've got a lot of capacity on the boat then the people who want to use the boat know that they can ask for a less of a price because you're desperate to get rid of the capacity so you want to do the supply chain management without revealing how much stuff you want to put on what the demand is and what the supply is of the data so there's all sorts of commercial applications where you want to maintain privacy so it's part of what we call a privacy enhancing technology it's one of the things that are used to enhance privacy in various applications.
Which is kind of the theme for what I'm doing today which is all around data privacy obviously from cloud players perspective a lot of that means on the web but there are lots of other places where data privacy matters.
Are you seeing applications on Internet things with MPC? Yeah so it depends what you mean by Internet things.
Internet things is everything in some sense everything's on the Internet so what we've done what we do in Unbound is we kind of take this MPC thing in what we talked about before is where you put data together you've got a party who's got some data let's go back to who's going to pay for lunch you've got John's bonus and my bonus and we want to put them together to produce some value in putting them together who should pay for lunch or we have the supply chain we've got amount of capacity in boats the demand for stuff to be put on them and we want to keep those values private but we want to extract value of the of the data by doing some computations that's putting data together okay now what Unbound we do is we split three people apart so instead of acting as a marriage counselor we kind of for data what we act out is is divorce a lawyer for data so in on the Internet and various applications you have cryptographic keys so these are the keys these are the key the gate holders to the the kingdom they secure the SSL links they might they secure your if you're in a blockchain world they they'll secure whether you can pay pay for things from your bitcoin wallet if you're doing a software updates they will they're the thing that signs the code signing to allow you to execute the you know so that your client will download the right software now these are very very important keys that you want to keep secret now one of the way the well there's a number of ways of doing this you could call us traditional you could write them down on a piece of paper if you're kind of in the old-fashioned bitcoin wallet you would write it down on a piece of paper which isn't very secure because these are very long keys you could store them on in hardware on a smart card or in what's called an hsm or you could just leave them on disc somewhere um yeah and then throw them away and then yeah so there's all sorts of different ways of doing this but the problem is is that nowadays we have these elastic um environments on the cloud so we can't really rely on hardware pieces of paper don't really cut it um and and other sort of applications so what unbound does is it says okay instead of putting the key in a in a a single secure location and therefore having to put a lot of protection around that single secure location let's split the key up into two parts or three parts or four parts distribute these keys around the world in different locations different service providers on different computers and then use an npc protocol to actually execute the cryptographic function without ever bringing the key back into a single location so what you do is you obtain security by separation and an advantage of this is that we can then tie the separation to the authorization so if we go to the code signing example you might have your organization um needs to sign some code and it might be that you need two out of three managers to actually do the signing to enable the code to be executed and what we allow what npc allows is it allows you to actually embed that authorization within this distribution of the key so we split the key up in such a way that um that two out of three people in your organization and the specific two out of three people need to authorize the code before it is um pushed out you can kind of so you get the ability the benefit of security by separation you get the benefit that it's a software solution so therefore it's very elastic you can you know spin up and spin down um entities to do the calculations at will and you also get this ability this tight ability to to tightly link the authorization with the actual cryptographical operation which is doesn't exist enough which therefore makes key management uh if someone leaves the organization re-sending their key without having to re-key everybody else becomes so much much more easier so it reduces the total cost of key management it makes your system simpler more elastic etc so that's what we do so we're using npc to break vader apart in some sense how is that related to shamir's secret sharing scheme it is shamir secret sharing so basically at the bots at the at the really lower level of an npc system we use some form of secret sharing whether that's shamir secret sharing or we use other forms of secret sharing to split the data up so it's the splitting of the data up is done by secret sharing and then the ability to compute on the data whilst it's in that secret shared form is npc interesting ah fascinating let me go back to something when you were talking about the millionaires lunch thing um you mentioned it was very very slow and it got a lot faster is that because hardware got faster or because we had a algorithm it's a number of things so um on one hand hardware got faster so when we do the millionaires example especially with two parties we actually use a lot of um as computations so if as is built into the chip as it is now with the x86 is that that runs very much faster it also npc requires you to communicate so having very fast reliable networks which don't drop packets and that's also gives you an advantage but it's also a huge um around the turn of the last decade so about 2010 2011 2012 there was a huge advance in theory and it just seems to be rapidly advanced so people could start actually implementing these things and also once you start implementing things it changes what the research is about up until maybe the mid 2000s people were looking at this as it's a theoretical thing therefore we're looking at theoretical questions and maybe they would go okay we're going to try and optimize this this protocol to try and make it more efficient but because they couldn't implement it in the real world they were often looking at the wrong thing to optimize when you see a protocol so our first paper where we did the comparison the main point of the paper was is that all you theoreticians have been trying to optimize this bit of the protocol but it's actually this other bit which is actually costing the time we need to look at this other bit to make everything speed up so it's a kind of and then you get this you know virtuous circle you can implement the protocol therefore you can come up with a better theoretical understanding therefore you can implement a better protocol therefore you can come up with a better theoretical understanding and it just accelerates and accelerates and accelerates maybe this is one of the interesting things with you know the with the quantum computing world coming we have all these algorithms which we worked out for quite a long time and I swear that as soon as we have a quantum computer we're going to go oh wait a minute we can do this we can do that and it's going to get even even faster are these mpc things fundamentally safe against quantum computer attacks yes so unlike the cryptography that's used to secure the Internet rsa elliptic curves or whatever cryptography that's underlying mpc is all secure against quantum computers so it's either something which shamir sharing is what's called information theoretically secure no matter how big a computer you have whether you let it run for the infinite length of time you still can't break the scheme or it relies on security of the as encryption scheme which its security goes down a little bit due to quantum attacks but not that much it's not broken this is not broken by a quantum computer or we use lattice based schemes to do help do mpc and lattice based schemes are known to be or believed to be secure against quantum computers just speaking about the lattice based schemes are they practical are we seeing them used practically lattices well yeah so there's various levels of that so for example lattice based encryption schemes have made it probably the most promising form of post-quantum security um if you look at the nist uh competition competition in quotes we're not allowed to call it a competition but nist is running um for the next signature algorithm post-quantum signature algorithm post-quantum encryption algorithm almost all the major candidates are based on lattices um so they're very efficient but that's the general usage in your browser doing key agreement google microsoft and i think Cloudflare as well have done uh experiments using quantum key exchange within the browsers and within within within the systems and then when you scale that up and you want to do um bigger things so for example you can use lattices to do fully homomorphic encryption and ah we get to talk about how we're getting yeah we're going to get there i thought i put that in so you had that link so so you can use lattices to do secure computation both for multi-party computation and for fhe and we'll talk about differences maybe in a minute um but we can do that in a way um they're also relatively efficient yeah all right look you brought it up i didn't what is fhe why do i care about it is it a pipe dream okay so fhe is a is another method for computing on encrypted data so it's npc is a method for computing on encrypted data where we share the data and the and the encryption is a sharing and to do the calculation many parties have to send messages backwards and forwards in fhe instead of encrypting via secret sharing we encrypt using one of these lattice-based encryption schemes and then the lattice-based encryption scheme is a bit weird in that it allows us one server to do computation so it can compute on encrypted data so it says this is an encryption of x this is an encryption of y it can blindly produce an encryption of x plus y or encryption of x times y the disadvantage so that's an advantage and there's two disadvantages for fhe one it's incredibly costly on the server but you only need one server as opposed to many but the other key difference which is makes an issue when you apply it to use cases is that it the the secure computation produces an encryption of the result and therefore you need someone to decrypt the result and the person who decrypts must be different from the person who does the computation so often fhe doesn't fit a number of commercial use cases because it's the person who does the computation wants the answer and if you have to give them the key they might as well just decrypt the data in the first place right right yeah so there's a kind of a bit of a dichotomy yeah but it's still kind of magical that you're doing calculations on data that is encrypted without decrypting it yes that's the key in both technologies npc and fhe we are able to compute on data without actually seeing the data now this causes problems so anyone out there who actually programs you'll know you have if statements if statements are a real pain because if you branch on something that should be kept secret it means you have to execute both branches and then multiplex to get the right answer out at the end because actually knowing which way you branched reveals information about the oh that's what happens spectre basically that's basically what happens in spectre and that's why you see why sgx is not going to be a solution right because you have to do branches so what sgx will do is it will execute that branch because the data will be decrypted within the chip and therefore the side channel will detect that the side channel will detect that and that is why so it's exactly this it's exactly the same problem fair enough all right we've got a couple of minutes left what's next what's coming bigger faster better it's always but currently we can only compute relatively simple functions i mean complex ones but relatively simple there's a company in paris that can evaluate evaluate neural networks in fhe for example but training a network on encrypted data so we can evaluate deep neural networks with npc or fhe but actually training a deep neural network on encrypted data that's out of the question at the moment generally speaking so that's the i think we just need to be able to cope with more but is there a future a near future where this is happening in my browser or in my phone that this is becomes a fundamental technology that's helping keep my data private yeah i think yeah you're seeing we're seeing a number of applications i mean we we do something we're i'm bound to something with the phone google does analysis of ad auctions with npc operations you can think of some of the data discovery things in your phone recovering who you're sharing contact addresses have you got contacts in your two phones that are in common that's also an npc calculation what's called private set intersection so there's all sorts of applications that you can see coming up or actually deployed or in proof of concept or yeah everyone's looking at this all the big companies there's loads of startups in the space it's it's it's the next big thing in security or is the currently the big thing in security it's currently the big thing fair enough fair enough all right listen we're almost out of time and thank you for you know talking to me if someone is really interested in this stuff and wants to go learn more where should they begin so okay so we're about to put on youtube in the next couple of weeks um some uh videos the other thing you should do is that there is a alliance called the npc alliance.org which also has a list of books videos um it's a bunch of all the companies in the space so there's books videos um lecture series all available there um and there's a similar one for the fhe situation but i can't remember what they're called but npc.org is the place to go for npc i would say all right we'll do that listen we're almost out of time nigel thank you so much for taking the time to chat with me on data privacy day good luck with npc things sounds fascinating thank you cheers