📊 Data and Privacy
An interview with Emily Hancock, Cloudflare's Data Protection Officer, on what data privacy is and how we deal with it at Cloudflare.
Hi everyone. Thank you for tuning in to this session of Cloudflare TV. I'm Kalpana.
I work in the business intelligence team here at Cloudflare. And today's session is a conversation with Emily Hancock, who's the head of legal for product, privacy, and IP, and our very own chief data protection officer here at Cloudflare.
Welcome to the show, Emily. Before we get started, could you perhaps give us a little bit of introduction on your role here and how you got here, got to meet at Cloudflare?
Yeah, glad to be doing this today. So I have been at Cloudflare for about three years and I head up our privacy program.
And that means I do all the things on different privacy policies and work with product team on privacy by design issues.
So kind of a soup to nuts role. And I started my career in privacy in private practice at law firms in DC, and then moved out to the Bay Area to take a job at Yahoo, and was at Yahoo and then at Evernote, both working on a lot of privacy issues and before I ended up at Cloudflare.
And one of the things about privacy law that's really nice is it's a pretty women-focused area.
And we're here in Women's Empowerment Month. And so Kalpana, I wanted to ask you about, because you are much more in like the product and data side.
So tell me a little bit about your background. And I imagine it's a little bit more of a boys' club than I had to deal with.
Yeah, it's been pretty much a boys' club all my career.
I think I started as a software engineer in distributed systems right out of school.
And I remember when I started my first computer science class that I did in high school, I was one of five girls in a class of about 65.
So, and that's been pretty much a consistent ratio all through my career.
But funnily here in the integrated data and analytics team at Cloudflare, four out of the five senior managers in our team are women.
So I'm pretty excited where we've gotten to all these years, so pretty cool.
That's great. And what's your role, your role now is doing what?
Because I know you didn't start out in the business intelligence team when you joined Cloudflare.
Yep, both of us started about the same time, right?
Three years ago, I started in the product team managing the Cloudflare dashboard when I started.
And I've always been a data-driven person. I actually started my product career way back when at Yahoo, some overlap there, when I launched machine-learned web search at Yahoo.
So I've always been a data nerd, I guess.
And so when the business intelligence team started at Cloudflare, there was just like a happy synergy and I moved here.
So that's a little bit, so back doing engineering stuff, I guess, after many, many years of doing product stuff.
So that's a little bit of how I got here. You know, going back to our discussion, data and data-driven insights are a huge thing today.
All companies want to leverage data more efficiently, but along the way, there are also real people rights, people freedoms that we need to start thinking about because data is not its own thing, right?
It is about people. And we have to, as we think about building the smartest algorithms and the most effective decision-making tools, we also want to think about the things that are important to people.
And I know that's what you think about every day.
So can you like start us off by walking us through what is privacy?
Like how are we thinking about privacy? Yeah, yeah, sure. I mean, I guess I always think of it as privacy can mean a lot of different things to different people and it can be...
So when we're talking about data protection, more specifically in data privacy on the Internet, I think a lot of it comes down to, you know, the idea that you get to control your personal data.
So having some idea of knowing who's collecting it, what they're using it for, and whether you have some rights to be able to restrict that collection and use.
But also some people think about data privacy online as, you know, protection from government surveillance or from ad trackers and targeted advertising.
And then there's other people who think that privacy means that you need to be completely anonymous online.
So there's a lot of different ways to think about it. And then underlying all of that is, I think the idea that comes to us from Europe, which is that data privacy and privacy generally is a fundamental human right.
And so we kind of think about it that way as well.
And, you know, then when you look at the government regulations that are out there, not only do we think about what privacy just means to us as human beings and what that can do, what impact that has for personal information, but we also have to think about what the governments have told us.
And so we have like the European kind of granddaddy privacy legislation called the General Data Protection Regulation, which is one of the most comprehensive data protection laws out there.
And it really focuses on giving people the right to control how their information is used.
And it does treat protection of data as a fundamental human right.
Whereas in the United States, for example, we have a California Consumer Privacy Protection Act and the United States views privacy a lot more in the lens of protection from governments for surveillance specifically.
And then also then as a consumer right to have kind of this ability to say, don't sell my information and I get to control how that information is used.
So it takes a kind of different approaches and at Cloudflare we are doing our best to kind of take all of that in and think about privacy in all of those ways.
So yeah, in Cloudflare though, right?
You bring up an interesting point, right? Privacy can mean different things to different people and in different geographies, the focus is kind of different.
And at Cloudflare we have both ends of the spectrum, right? Our customers can be individuals, you and I can have like a Cloudflare account or it can be big Fortune 500 companies.
And we are not local to one area, right? We have 200 plus data centers all over the world.
We have customers all over the world. What does privacy mean?
Is it different for an individual versus a company? How do we think about that here?
Yeah, yeah. I mean, it is really tricky because we have users who are individual free users and we have big Fortune 500 companies, like you said.
And so we are trying to think of it from across the spectrum.
And so in one way, we don't really think about data as something that necessarily we wanna view through the lens of any particular government.
We think about personal data as this ocean of data.
And we are trying to figure out how to empower individuals and entities to reduce amount of data in that digital ocean.
And it doesn't really matter whether there's laws protecting that.
I mean, we obviously are looking at those laws and taking those laws into account and designing our privacy practices around them.
But a lot of things that we do also focus on security as a means of achieving privacy.
And we also have to balance, our customers that are large companies have a certain visibility into data that they really need.
And that's part of the service we provide to them.
So we're looking at ways of doing that to give them visibility into the data they need in ways that are still privacy protective for their end users.
And for end users, we're looking at technologies that just help make the Internet better overall, which is we think an Internet that is better is an Internet that is more private.
So we have solutions like our 22.214.171.124 public resolver, which means that you can surf the Internet and the ISP sitting in the middle isn't gonna know where you're coming from.
So, or what data, I'm sorry, what sites you're going to.
So, we look at it from a lot of different perspectives and focus on a lot of technologies that we think are going to be enhancing privacy for both those end users and then also empowering our customers to be able to make the privacy commitments to their end users that they feel like they need to be able to make.
Yeah, and then that actually brings us back to like all the different kinds of data we deal with, right?
We have our customers, we have our customers' customers, we have business intelligence.
We do a lot to provide our customers with information.
And I guess if I can paraphrase what you said, you are really thinking about how do we give more tooling or more control to the person whose data it is.
Yeah, yeah, exactly. And one of the things that is really important for our customers, you know, so for example, if Acme is our customer and they want to use our services to safeguard their widget selling site, they wanna know that we're gonna provide at least the same level or better privacy protection for their end users who are buying those widgets.
And so we do a lot with our customers to provide them with information about those privacy and data handling practices and including like signing data processing agreements with them to memorialize those practices.
We have a lot of FAQs and transparency is really, really important because we want customers to understand exactly what's happening with their data.
And then, which in turn helps them understand what's happening with the end user data and then they can translate back to their end users.
And so it just makes the whole cycle a bit more transparent and which is also very privacy protective because as an end user on the Internet, if you have a sense of what's happening with your data, you can be more in control.
Yeah. Where do you think we go from here, right?
What do we, we could go like dig dive into like different kinds of data that we have here at Cloudflare and how we use them or we could go and think about like in depth what different customers do when they get like these controls.
So what do you, where do you think we should- Yeah, no, I think it'd be really great for you to talk a little bit about the kind of data we have here because that, you know, you and I talk a lot about the privacy issues that come up when you're dealing with these different buckets of data.
So maybe you could talk a little bit about what some of those buckets of data are that we're dealing with and that you're having to make some decisions about.
Yeah, so I think when we think about a service like Cloudflare, right?
You come to Cloudflare for security.
So there are marquee security products that are very, very data -driven like our DDoS and bot management teams work with network data every day on top of traffic to help us better detect attacks or automated traffic or different types of programmatic actors on the Internet.
So that's one kind of data that we have.
And we can actually say that's two flavors of data, right? There are existing marquee products and then there are new products that come up when you kind of look at data and see what newer security or performance solutions can be built.
Argo is a classic example, our smart routing product where we kind of provide a ways for the Internet by finding out the fastest route during highly congested network times.
Recently, we had phishing detection, DNS tunneling and these are security products that help us kind of understand all these go, all these products are built on top of understanding data.
But the thing about these is our customers kind of expect us to do this.
They expect us to look at these things because that's part of the service that we provide them.
And then there are two other buckets of data if you can think about it.
One is customer analytics. I sign up my site for Cloudflare, I wanna know what's happening on my site, right?
What does my traffic look like?
What does my latency look like? What geographies are my users coming from?
And that's also an expected use of the data where people come in and say, I wanna know.
And then there is business intelligence where we try to use the data to make our jobs in Cloudflare kind of more effective and data-driven by bringing together data from various sources, internal and external, to help all our teams do their jobs like more effectively.
And while we say that there are these big buckets, right?
How we build products on top of data, how we show our customers, what is happening on their network and how we make our teams internally effective, they all seem like reasonable buckets of data, data things that we do.
So when we think about privacy or when we think about like designing our data systems, we can even step back even further and say, all these different data things, they either work and this is grossly oversimplifying, but bear with me.
They work by either aggregating patterns or by providing personalized solutions, right?
You can think of things like, you don't need to know for aggregate patterns, you don't need to know anything about a specific person.
Like you go to a Starbucks and if Starbucks wants to know what is the hottest selling drink in a particular geo, they don't need to know like Tarek from 123 Easy Street got like a vanilla latte, right?
They just wanna know that store X in Mountain View sells like a lot of vanilla lattes, but store Y sells a lot of pumpkin lattes, right?
So that's aggregation.
And so there are a whole range of problems that you can solve by looking for aggregate patterns.
And then there are problems that need like more personalized information.
Like when you show a customer that data that is like very specific to that customer, when you send an email saying, hey, do you know this happened in your, this is what happened to your network last week or when Starbucks sends an email saying, hey, we have new pumpkin latte promotion today, they are actually connecting the information that they have on their end to a particular person.
And there are very different ways to think about aggregate data versus personalized data that connects data to a particular individual or an entity.
And I know you, this is where you'd like to work with the product and engineering and data teams.
What do you think, Emily? Like, what do you think of data and aggregate versus emails that we can send out to people?
Yeah, well, I mean, as you pointed out, you need both things and it's not really possible to just deal in one or the other.
So when I kind of think about it, I'm thinking about some of our fair information principles.
And one of those is data minimization. And so I think one of the challenges that I always kind of put to anybody on the product side is, do you need all that data?
And Cloudflare is one of the places where it's pretty easy to work sometimes when you're in privacy, because usually the product teams have thought about this before I even have to ask the question.
And they've already said like, oh, well, we aren't gonna need that kind of data.
We just need this minimal set.
So it's pretty nice that way. But one of the things that we always kind of look at is how can we aggregate data?
And how can we really think about what trends can we gather if we are truncating IP addresses?
What trends can we gather?
What information can we gather if we're not personalizing it? And so we do kind of wanna push in that direction because aggregation means that you aren't gonna be able to identify a specific person.
And that's a great way for us to do threat intel and research on overall trends, because we don't need to know what a specific person is doing for most of the thing.
But there are, as you said, like there's clear times when you have to have personalization and it's really important for helping our customers and they need to have some more specific information.
And so then that's when we bring in privacy by design. And this is this idea where you bring in privacy at a foundational level.
And so I like to kind of think about this, maybe I don't know how great this analogy is, but the analogy of building a house, right?
So if you lay a foundation and you start framing out rooms and then somebody comes in and says, well, wait, we're building two stories, not one.
And by the way, we wanna have an indoor pool. And so the contractor looks at the foundation they've laid and the framing, they're like, I haven't structurally engineered this for two stories and an indoor pool to contain a lot of water.
So we're gonna have to go back and re-engineer this whole foundation and structure.
That's obviously more time consuming, more expensive. Nobody's happy because it means a lot of extra work.
But if we're building privacy in at the first step, if you're that contractor and you know at the beginning that you're building two stories of an indoor pool, you're going to anticipate how that means you have to build out the product.
And so it's the same way in working with teams that are dealing with data is having conversations early about what the privacy principles are, what are the goals of what we're trying to do?
How can we achieve those goals with minimal personal data?
How can we achieve those goals in a way that is transparent?
How can we achieve those goals in a way that retains data for as little time as possible?
And then I think one of the most cardinally important rules of all is if we've collected data for a specific purpose, we don't then just repurpose that data for something else if we haven't already told the customer or the user what we're doing with the data.
So you never want to use data, tell people you're using it for one thing and then go ahead and use it for something else.
That's kind of like one of the big red lines that you never want to cross.
So I think we try to set up the boundaries around how we're using data, how much data we're using, pushing for aggregation when we can, and then when we do have to personalize, do as much as we can to kind of lock down that data to make sure that we're respecting the privacy of the individuals whose data we're using.
And while at the same time, balancing that against what customers want and what customers are looking for.
So like in analytics, if customers are looking for some pretty detailed analytics and they need to know the IP addresses of their end users, it doesn't do a lot of good for us to then say, no, no, no, we're going to aggregate all that information because that's not going to give them the analytics they need.
So we look for ways to give customers what they need and then explain it in a way so that they can make sure to pass that along to their end users for the transparency.
But if they don't need that analytics, and we have had customers who say, we don't want that data, we don't want to know anything about our end users, coming up with tools to meet that kind of privacy expectation as well.
And so that's why like one of the cool things about Cloudflare is I think in the tools that we give to our customers, we try to give customers these different levers that they can pull to help personalize their levels of data protection and privacy protection that meet their expectations for their end users.
And so working on those kinds of issues with teams like yours and product teams, I think is one of the really interesting parts of the job.
Yeah, and really, I think hearing your talk, I think one of the things that we've tried to internalize here is how Privacy by Design is an operational mindset, right?
It is how you operate day to day, and we have to kind of change how we operate it.
They're going back to your example on like the foundations we lay as you build a house, right?
A little bit when you're working on software and data is you kind of have to expect that you're going to build a 10 story house on top of this, even before you start, even if you have no idea what the 10 story is going to look like, right?
So that's a little bit of how technology evolves that you start and you start building and then new ideas start coming up.
So when we think of things like Privacy by Design, we try to ask like questions like, can I solve this problem without hyper-personalization?
You know, just asking simple questions like that, you'll be surprised how it helps reduce the need for sensitive privacy, risky data in our things.
And the other thing that I think about is time is also a really important factor, right?
Nothing is forever. So we have like time limits on how long you save data for, or how long you act on things that are private.
So that's where things like aggregate data can be stored for longer because you're trending and doesn't attach itself to one particular entity.
But when it's attached to a person, and I think there are laws also around time, isn't there Emily in other aspects of work?
That's something we do here just to be extra cautious. Yeah, I mean, there are and there aren't.
I mean, there are certainly some laws out there that have some very specific data retention requirements.
A lot of times those are actually geared more at law enforcement interests, that they want to have certain data retained in case they need it for investigation.
But when you look at some of the comprehensive privacy laws like GDPR, there isn't really a statement of X type of data needs to be stored for a certain amount of time.
What the laws are really saying is you only should have that data for as long as you have a business purpose for it.
And that's, I think, why the process you're describing and the privacy team and our product counseling team that work with the product managers and engineers, we're not coming down and saying, here's the rule, follow it.
It's very much of a dialogue and a partnership because I get asked a lot, well, how long are we gonna retain this data?
And usually my first question back is, well, how long do you need to retain the data?
And then that question of need is figuring out what do you really need versus what do you want?
And, you know, cause I think there are some engineers out there and every company will say, I'm gonna keep the data as long as I can because there's always something I might be able to use it for.
And so you obviously have to curtail that, but you want to make clear that there are, if you have business reasons that you need to keep a date, certain kind of data for a year, for example, then, okay, let's justify that.
And that's the privacy laws lay out a framework for allowing you to do that.
And then that's kind of where it comes back again to the issue of transparency and making sure that the data subjects understand what's happening with their data.
So if you say you're gonna keep it for a year, you need to keep it for a year and you don't just kind of hold onto it for longer than that.
And so, yeah, and I think like that's why that partnership and working really closely with teams is so important because as you were saying, you don't always know where you're gonna get.
You kind of think you have this idea of how something's gonna evolve for a product or for a kind of business intelligence analysis.
You think you know where you're going, but after you do some of the research, you may not be at the place you thought you were.
And that's why it's an ongoing dialogue for us to kind of say, you know what?
We thought our business purpose was this and we thought we were gonna end up with this conclusion, but actually as we do some more research, it's taking us in this other direction.
And so we wanna make sure we're staying in touch and having those conversations to mitigate privacy risks and to make sure that we're being transparent and those kinds of things.
I think we're coming up to the top of our hour.
We have like a couple of more minutes. Before we conclude, Emily, maybe do you wanna talk about, do our consumers, how they can advocate for themselves?
Like how users and consumers can advocate for themselves or what are privacy advocates thinking about when they are working with the lawmakers, when they make these laws?
Like, what are the fundamental principles of data protection rights?
And maybe we can conclude that. Yeah, sure, yeah. Well, it is kind of interesting because I think a lot of it comes down to this issue of having control.
And so companies are trying to give data subjects the ability to access their information, to request that it be deleted or the ability to say like, let's correct the information because I think the information you have about me is wrong.
And that's where a lot of the data regulations that you see globally are focusing their efforts.
And that's particularly where California is focused.
And then we've got a new federal privacy bill that's just been introduced that focuses on a lot of those things.
And this idea that you have to opt in to certain uses.
The challenge, honestly, is that this puts a lot of burden on people who don't necessarily always have a time or interest to exercise those rights.
And so anybody who's ever been on a website has seen a cookie banner pop up and it takes time if you really wanna go in and manage those cookies to make sure that you're not having ad tracker cookies, for example.
And it's time and effort. And when you really wanna get into an article that you have to read or you have to log into a website to do some kind of transaction, you're like, most people's reaction is fine.
I'll accept them all. I don't care. I just need to get this thing done.
And so I think the challenge for companies is while you absolutely need to give people that kind of control over their data, you have to also think about how are we gonna deal with this for those people who don't want to be burdened with those kinds of controls all the time.
And that's, again, where you come back to this idea of data minimization, setting your own retention and being transparent so people understand what's happening with their data.
Because at the end of the day, as consumers, we do have to take some control.
And if you are very concerned about the privacy of your information, then you do need to figure out like, okay, how do I get my data erased from certain places?
And so I think we'll see a lot more laws.
There's a lot of federal laws or national laws out there in a lot of countries globally.
And I think we'll see the United States catch up. We aren't quite there yet, so we'll see.
But for Cloudflare, we're a global company, so we're paying attention to all the laws around the world and trying to find the common threads that make the most sense for us to both comply, but also to help us shape a really holistic approach to privacy of data.
Great, we have to leave it at that.
I think we are at the top of the hour. It was a delight talking to you, Emily.
Thank you for your time today. Yeah, great to talk to you, too. Thank you so much for sharing information about the data at Cloudflare.
Thank you. You