CTO Interview with Laurent Cerveau and John Graham-Cumming
Presented by: John Graham-Cumming, Laurent Cerveau
Originally aired on January 28, 2021 @ 5:00 AM - 5:30 AM EST
In this Cloudflare TV Data Privacy Day segment, John Graham-Cumming will host a fireside chat with Laurent Cerveau, CTO of Zenly (part of Snap).
English
Fireside Chat
Data Privacy Day
Transcript (Beta)
Okay, welcome to Data Privacy Day on Cloudflare TV. I am John Graham-Cumming, Cloudflare's CTO, and I'm joined by another CTO, Laurent Cerveau from Zenly.
Hi John.
Welcome, thanks for coming and talking about data privacy. I'm betting quite a lot of people sadly haven't heard of Zenly, so why don't you give us a quick introduction to what it is and what it does.
Definitely. So Zenly at core is a mobile application, so it's a product and it's a product that basically gives you, provides you live maps showing among other your friends and your family.
The core idea behind Zenly, it's a company that was founded in 2014, was say okay originally maps were based on geographical information and the same for everyone, but we kind of want to go into more about okay the maps belongs to me and I want to see my life on the map, my social context, my friend on the map, what's happening and what they can be up to, go and see them in real life also.
And so we developed this application, yes, since 2014 and in 2017 we've been acquired by Snapchat, so we're part of the Snap family also.
So we continue to operate independently, we're based in Paris and we have a few millions of users a day, I mean more than a few millions I should say, I would not go into, I was underestimating.
So what, you know, there must be lots of location tracking things, I mean the iPhone has, I can look at my wife's location, well I could if she let me, but you know you can look at those sorts of things.
What makes Zenly in particular interesting? Zenly, we had originally, so yeah it's a lot about location, it's many things around location, we extract contextual information from location, we have features going from real-time location to I mean more understanding what's behind location, like what could be your home, your workplace and this type of things.
And originally, I mean we had this unique point about we were able to make this type of software work without, I mean eating all the battery of your phone.
When we started, I mean the main concern around location data was mainly about, oh if I'm using too much location then I have no phone in two hours.
And we found some way to do that in a, I would say, in a battery -savvy way.
And as we moved forward and we developed more features, we understood more the complexity of location data, the quality, the inherent quality, I would say inherent bad quality of location data, which may be a surprise to some people, but if you're not in some areas of the world with the proper coverage, I mean it can be a view in different strange places.
And yeah, we've developed many features around location.
Okay, and so you got this working and clearly this is a lot of private data where people's families are, where their kids are, and stuff like that.
So let's talk about privacy in this context.
I understand that you actually train people in Zenly on what it means to be private, to use private data.
Can you talk a little bit about that? Yes, so basically privacy, if you allow me to come back a little bit in time, I mean I would say the world started to deal massively with privacy when GDPR came into effect in 2018.
It doesn't mean that people didn't care before, it's just like suddenly you had no choice then to deal with.
Zenly, we knew from the start we are dealing with information and data that can, that is I mean very private, that may require I mean a lot of computation to be understandable, but basically I mean it shows private people.
And so we've decided to say, okay, how are we going to tackle the topic generally in the company?
And what I found out is that something very effective is first to explain people what is privacy and what will be part of their job.
And I can extend into the job in area they may not expect.
And so we've developed internal trainings and I found this a very efficient way to have people suddenly understand, oh this is all about it, this is not about this idea I have about privacy.
Because in privacy you have a mix between legal text, you have the mix between the culture of the people and how they're going to be receptive to the notion of privacy, just by their education, where they grew up, how they grew up.
And you also have behind it everything you need to do technically or product-wise in the company.
The implication, the technical implication can be go very far away and have a lot of, so I mean impact on the way you're going to store, manipulate under data.
On the product side, I mean you have to decide if, oh what does this mean to be privacy?
Zenly is definitely for close friends and you don't want to be in a case where basically privacy is going to be handled at many level and product level, not only from the, I would say, position point of view, but also about, hey who is this person that for example is sending me an invitation?
You want to protect the person sending an invitation, but still you want to have the person receiving the invitation being able to refuse if they say, no this is too far from me, I have no idea about who this person, no idea about who this person could be, I would say.
And so this is why we develop some trainings and then something I'm really, I found out like in many things, education is the first basis you need to have because you can give guidelines, you can say, you can give rules, but inherently people need to get a kind of internal feeling about, okay what I'm doing here is right or what I'm doing here is not on the right way.
So that's there are many parts in this privacy training, if you allow me to continue.
Yeah I'd love to, I mean one of the things you mentioned was you have like legal texts, like privacy policies and stuff like that, how do you translate that into something that practically an engineer for example can interpret and so they can understand when they're designing something and actually when they're coding it?
So that's, you first need to understand about, so the legal texts, they can be so very dense, but they can be also, sometimes you will find them pretty gray, it's not black and white.
I'm going to give a few examples, going to translate it to technical areas.
You, many people expect when they have to deal with legal topics that somebody will tell them, oh you're right or you're wrong.
And if you look at the legal text, it's kind of funny, I take two definitions, like if I take the definition of user personal data, personal data is data that may contribute, that can contribute directly or indirectly to the identification of someone, so it's finding back.
Then you have this notion about indirectly, what does that mean? Does that mean is it enough?
No. Does that mean it can help? Yes. Does that mean it needs some context to help?
Probably. So all these texts, when an engineer will tell you, okay just tell me what is private or not private, sometimes you cannot answer, it may depend on what's going to be available at that time to consider it private or not private, so you will transfer data where you will do the processing and this type of information.
Other, another fine line in the text which I found always interesting is about anonymization.
Anonymization is, the data is considered anonymized when basically it's a mean to try to identify someone from it or above what's reasonable or some type of blurry line like that or if you think about, you know, I remember this text in GDPR where it's written about parental consent, we count about company to be creative, to provide good solution without interrupting their workflow, so you have a lot of places where you need to invent things, where you need to understand about okay what's going to be the implication, so you have to come back to the basis and the basis is just like if you think about privacy, you're going to deal with data, you're going to deal with user, you're going to deal with product.
Product, there is one basic rule, the basic rule is just like the user must understand what's happening, clarity is number one rule, nothing can be blocked.
So, of course, many people will consider that clarity is about having online text, oh please go and read our terms, please go and read our privacy policy, everything is there, it's been written by a lawyer and then it's okay and it can be legally okay, but it can be not clear at all.
I must say Zenly, I mean the privacy policy, many people and there were some lawyers, but I mean many people not lawyers contributed to it, so it can be understandable.
People, I mean we know that basically we have user from many ages, we want them to be able to read and answer questions.
When we do translation, I mean we are a French -made company, of course, I mean we're reading the translation ourselves and we're going to say send it to a translator, because then I mean if you translate it from legal document to legal document and we try to have them read in every country properly by people, so it can be understandable and that's a lot of work and then you need when you translate it, the concept, so you translate the concept about what do you do, how do we use the data, what do you do with the data, which data do we get, all this type of information and then you need to translate them, but okay I can use them, I mean in my daily job and in the daily job you have mainly two components, I mean the product part and the technical part.
I'll focus on the technical part for a while.
The right of the users and we're moving to the from the product to the user part, right of the user, for example, is to be able to have their account deleted in an easy way.
This has a lot of implication, this has I mean drastic implication from the way you're going to store the data, the way you're going to, I mean, handle the deletion process and think about all the implication of it.
So, for example, Zen Li, you want to delete your account, you go, you press button, delete your account.
Every company has what's called a retention policy, so basically they must have a retention policy.
Deleting your account may not be immediate.
Deleting your account, I mean the law is not mandating it to be totally happening one second after and if you talk to some engineers, I've said I'm not going to delete the data on the account in production because if I have too many requests of them, then it's going to be chaotic and they're going to be unsynced between multiple devices, it's kind of scary people.
So, you have to have people thinking about, okay, how do you, when you're building this feature, what data will you delete?
What will be the impact on the production?
For example, I was talking about personal data, so these those elements of data which helps identify someone, you're going to say, if I want to store them like in a way where basically later I want to do computation, what I am going to store?
I may store the data per se. I may store an anonymized version of the data with a key, I mean another database and this key, so basically we have pseudonymized data and basically when I delete the account, then basically I'm just going to remove the key of the multiple keys.
You may decide one individual is going to go to multiple keys for easier protection.
You may decide also the storage of one individual is not going to be in one place but spread over multiple places on your storage because to make it like more kind of bring some noisy to it, you may also effectively add noise to it, but you need some way to, I mean as long as the user is an active user, to be able to denoise it and make sure that you cannot, I mean, denoise it when the user has gone.
You may decide to split a data structure into multiple parts that's going to be stored in different places because they need different treatment and requirement.
You also need to understand what you want to do, I mean, with the data when the user is active, when the user is inactive.
Just take the basic example. I want to know how many people registered at the enlist since the beginning.
Well, then you have to think about what's going to be the internal identifier for user.
I mean, obviously, I mean, you're not going to say, oh, let's do an application.
It's going to be, I would say, a social issue, I mean, a health issue, sorry, and I'm going to use a social security number as main identifier.
You know, that's kind of not a good idea, you know. It's a very basic example, but it's really not a good idea.
So, all the implication of in the technical world, you see that they are not anymore like legal consideration.
They're also production consideration.
They're also safety consideration because if you start to say, I'm going to do this stuff in a privacy way, then inevitably, you have to think security because the things go kind of end in end together.
You have to think also about the cases you never, let's stability of production.
You think your data is your account going to be deleted.
You take like any kind of application, please remove my account, and then you may have something where people will tell you, they will answer you an email and say, okay, we've taken your consideration, we have considered your demand.
It will take X days. If you come back to our website or application within those X days, then you're going to be automatically resubscribed.
You know, this is written in small letters. This is not what I call, this may be privacy compliant legally.
This is not what I call privacy safe.
So, when I explained to all the employees, just like, hey, yeah, you may be legally okay, but this is not what we want to be.
Then, if you delete your account, you open the application again, there is a pain.
Let's say, okay, you have X amount of day to change your mind, but if you go further, you're going to be resubscribed.
So, if you're not sure, don't go further. You know, it's very clear.
If you don't use Zenly for one year, your account will be deleted automatically.
This type of things, you know, you have to think about them and suddenly it's a lot of technical implication because I don't know many companies and where if I don't use it, if I don't use the product, suddenly my data will disappear from their stuff.
Everybody knew that, you know, when GDPR came into effect, you know, everybody has received this email.
I got your email in my database and you were saying, okay, I don't remember I was on those people.
I don't remember I was on this service and if you are unsubscribing from all, you can be sure Zenly, you don't use Zenly, you disappear.
Your account will disappear. What do you do about backups?
Because you keep backups as well and backups inevitably end up containing old data.
Yeah. So, backup. This is also the line you can see when you go and read some privacy policy on the web.
If you read them, you're going to read them after your account is deleted, your data may still be there because we need them or because for legitimate interest for x amount of years.
Practically, this translates into, oh, we have data into the backup.
We really don't want to touch the backup because, I mean, if the catastrophe is happening, then we really want to be able to go back two years in time and then you want to make sure that basically, and so this means that basically people are, because if you delete your account, you cannot expect, or if you go and say that to your infra team, please open every backup and remove the data from John or from Lauren.
They will go, no, I can't do this.
So, this implies, and in every retention policy, you have the time to do the deletion.
I mean, you have the time, so, I mean, you have the time where people can change their mind.
It's not mandatory, but you can have it. Then you have the time where you say it's going to be deleted from all our live database, and then it's going to be deleted from the backup, and this can be different times.
Nobody really knows it because, I mean, you know, it's not translated in the legal text, but this is what's happening practically.
So, one of the problems with the legal text is they can be very hard to read because, you know, if you explain to people this is what's going to happen, then I think people have much greater understanding if you say, well, we have backups, and you're going to inevitably need a backup for six months, for example.
But the legal texts often are really, they often have caveats that are like for legitimate purposes or things like that, and it's like that could mean anything.
Yeah, you have this many, I would say, you have this basic concept in law, which basically your data can be used with your consent or with legitimate interest, and legitimate interest is when companies say, oh, it's my legitimate interest to keep everything you've done, even after your account was deleted, because imagine I'm doing an e-commerce site, and I say I need to keep it because this is a regulation of this, this, and that.
And nobody will go and verify if this is really the regulation. Look at support data.
Support data needs to be kept this amount of time, unless you have a case on the support data with the user.
The definition of a case is support data for the user.
I mean, sometimes people can make it blurry. If I go back to the topic of backup, it's very easy, you know, like we have a retention policy.
We have, I mean, we have reduced by getting stable in production, I mean, the time to very short time, and basically those time, I mean, a lot of places, you know, in a lot of places in our storage, I mean, we make use of basic techniques, just like TTL, on like you can have them on GCP, AWS, you know, everything's going to be disappear.
But to reach this point, I'm not going to go into detail because the retention policy is not some, we have one, we must have one, and we have a lot of information in our way, but I don't want to make other people jealous, but we have very short time.
I can guarantee to you that when your data is deleted, when you remove your account from Zeleny, basically it will not be, and after two years, we still have this, you know, we're not counting, we can't, I mean, we don't count in years, definitely.
And how do you think about, so that's the privacy, like deleting stuff, the privacy within your app, how does the app interact with third-party stuff?
How do you, you know, if you have to include third-party stuff, how do you think about their privacy?
So, when you include third -party stuff, you have some, you sign an agreement, and usually, I mean, this is where the the part of the job is also to mix, I mean, a lot of legal skills, negotiation skills, sometimes product skills, financial skills, and this is why also it's very important to, I would say, let's have a technical guy into commercial negotiation, because you have to understand that the workflow, you have to ask your third party, how do you deal with my data?
And then you're going to sign what's called a data processing agreement, they're going to say, you're going to be the data controller of the data, whatever, you have all these terms, and say, and basically, they will say, okay, we keep guarantee of it, but then you want to decide, you can always decide, because legally, you're all backed up, you know, no company will tell you, hey, I do this, and I'm not GDPR compliant.
I don't think it would be a good advertising, you know. But then you have to think about what will you send to them?
What do you have to send to them?
I take, I take, let's take, for example, the example about, okay, you're going to use, like, this software, because you want to do some data, let's say, analytics on the usage of the application, whatever it is, you can decide about, I'm going to filter things, I'm not going to send, I mean, I can create some additional UID for user events, which are not linked to the internal user ID, which themselves are not linked to their private information.
So, you get this multiple level, you have to have ways behind it to recover them.
You say, for example, if I want to say, you know, sometimes people say, I take my, I'm going to take my data structures, let's say, I have a data structure for the users, and I'm going to pack this data structure and flush it to, you know, like, I make a central point, and it's going to send to everything, to our server, to our third parties, to anything.
And then, basically, you say, no, I'm not going to do this, I'm going to have multiple versions.
One of them, it's not going to be the date of birth, but just a year of birth.
So, I can make age categories, but this is not personal information, because, I mean, in a complete year, many people have been born.
So, you have to think about what you will send, and never send something, basically, you have to think, if I'm sending my, this data to this partner, this partner, I mean, we have all the agreement in place, they're going to take care about it, they certainly don't want to leak it, because it's not a good advertising for them, they certainly don't want to have a problem, but I can make my own set of precautions here.
I can add my additional work here, to ensure that, oh, I'm going to limit what I'm sending, and to keep it to the core of the purposes.
If I want to see if I'm doing, like, let's say, Google Analytics stuff, and I say, oh, I would just want to know if the user has clicked on this button, well, you're not going to send all the information about the user each time, you're going to try to think about, yeah, I know this can be users in this country, in this age category, if you want to make statistics about that, and that's totally, I mean, it's fine for the purposes of doing analytics on that.
If you want to, if you want to use some, you can also blur the data, when you want to, when you send to some services, you know, like, oh, I'm going to say, we're talking about location, I want to look at this, you know, take a wider area, don't give the precise position, add a blur to the position, you know, don't say identify, because don't say, I mean, number that can say this request come from here, make all the requests you do to a third party come from your server, and not from the client, because then you have one unique IP, you cannot identify someone, and this type of things, when you deal with third parties, you have to think about, okay, where do I need to add protection?
So, again, this is very, this also practices that you have to teach people, oh, you're going to do this, you're going to call these third parties, do them from server.
Well, that's the thing I was wondering about, how do you teach people internally?
Like, how do you do this?
Do you actually have a training course? Do you, is it part of your development lifecycle?
It's part of, first, I have a training session. In fact, they have multiple training sessions.
And there is one where I do it myself, I continue to do it myself.
I'm not delegating this part for now, I will one day.
But basically, I do these training sessions. And then, because people understand the importance of it.
And people, because people understand that this is a topic that is not to be taken lightly.
And also, at the group level, I mean, Snap is a company that really takes care about privacy.
So, we found ourselves, you know, totally right on this.
And this is something that then you create a kind of, I would say, feeling about this is important within the company, we start to add, I can, I can, we can talk about it.
For example, in our marketing messages, or if we want to say, okay, privacy goes with security, and we're going to have a, you know, back bounty program, you know, this type of thing where you're going to be like, so you're creating the kind of culture internally.
And then sometimes you go and you know, some people are going to say, oh, I'm going to develop a feature.
And originally, they may not think about, okay, we're going to make a privacy review, privacy review, this is about, you look at the feature, you want to see if it's protecting properly, any type of any users, you want to see what the data, if it requires new data, where it's going to be stored for a long time, if it's going to be deleted, as you kind of, it's become the checklist, when you have the checklist, I mean, in all your feature document requirement, when in the checklist area, okay, privacy, then it becomes something that people do.
It's just like you do localization, you do privacy, you do product, you do analytics, you do all this.
This is like one of the other.
And if sometimes you go and I say to people, hey, did you think about that? Because we're still like, I mean, we are still a small company.
So I go and meet the people.
And we talk about a feature. And then I mean, say, oh, do you think about deletion?
Oh, yeah, sure. I mean, at the beginning, it was, I forgot that. No, it's just like, yeah, it goes this way, this way, this way.
And you create this type of habits where people say, oh, your phone number, don't do things on a phone number, you use a crypto hash.
Use this type of things, you can do a lot of operation on data, which is not, I mean, which is manipulated in a way where basically, you have to learn to do your goal without thinking that the form, original form of the data is going to be a problem.
I don't know, I mean, if I have time for a short example, but, you know, we said, I mean, we have a support page, if you type data, data privacy, you have a few pages that come where basically we, well, I was telling you retention policy is not mandated to be public, but you're going to find a lot of it, I mean, on those pages.
So we went further. And for example, it says, your address book data is going to be, I mean, crypto hash, salted and hashed and whatever.
For many people, it can be a mystery, but if you think about any type of social application, like recently, I mean, I had a lot of people arriving to telegram and signal.
I'm just going to warn you that you've got one minute, so.
Yeah. Okay. So basically you have this stuff, but, you know, if you have those people receive this notification, you have to think about, oh, they don't know, they don't have my address book.
They just have, I mean, a hash or a crypto hash, so it cannot be reversed of this data in your address book that is solved locally on the device.
And this is where you start, you know, people learn this about, okay, what's happening on server, what's happening on a client.
This is, yeah, you create a culture of that. If you want to be safe, you have to create a culture.
We're almost out of time. Thank you very much for coming and talking about privacy.
I look forward to the day you have your own YouTube channel teaching privacy development practices, because I think that would be very useful for people that you should get working on that.
Thanks for taking the time to be on Cloudflare TV.
Have a good day. Thank you, John. Bye. Bye.