Originally aired on June 9 @ 1:30 PM - 2:00 PM EDT
Join Alex Krivit, Product Manager at Cloudflare, as he sits down for a Fireside Chat with Troy Hunt, the founder of Have I Been PWned. Hear how
Cache Reserve is helping Troy serve over 99% of requests from cache.
Developer Speaker Series
Cool. Hello, everybody out there. My name is Alex. I'm a product manager here at Cloudflare for CDN and caching. I'm joined today by Troy Hunt, who is the CEO and co -founder, co-founder? Founder I think. Founder, that's me. Just the founder. All right. Have I been pwned? And we're going to talk to you about using Cloudflare services and products to scale websites that are seen and viewed by millions and millions of people. So I think maybe as just maybe a point of introduction here, Troy, do you want to kick it off and tell us a little bit about yourself and what you do? Yeah, sure. So I guess as it relates to the discussion here, I started a little project, was a little project called Have I Been Pwned? Almost 10 years ago at the time of recording, it was December, 2013. And it was designed to be a data bridge aggregation service, 150 plus million email addresses in there that you could search through from various breaches, most notably Adobe. And then it just grew over time. And now there's about 12 and a half billion unique records in there. There's also, I think about a billion passwords. I don't know exactly because more keep going in there. And I guess, particularly as it relates to Cloudflare, all the traffic go through Cloudflare and that Pwned Passwords one, the one to actually search for known bad passwords. Look at my stats now, we're approaching 5 billion requests in a month now that service. So there's like this little thing did grow up. I think it's fair to say. So it's safe to say 5 million, 5 billion people are requesting access to this, your service that shows people about their leaked credentials, their leaked passwords and things online. Well, I think we've got to give a bit of context to it. So there's really two parts here. One is you've got to have a been pwned. There's a front page, you can check your email address against data breaches as it appeared there. And then separately to that, there is a part of the service called Pwned Passwords, where you can check to see if password has been breached. And what's really interesting about that feature is there's an API that sits in front of it. There's an anonymity model, which came courtesy of Cloudflare some years ago. And that's now something that's built into the registration flow, the login flow, the password reset flow of a lot of really major online services to the point now with almost 5 billion times a month, we're at like 4.8 billion something every month. There are requests to see whether a password has been sent a data breach. And what organizations are trying to do here is stop people from using known bad passwords because that reduces account takeover attacks and all the other problems that come with that. That's fascinating. That's so, I mean, it's obviously a service that's been sort of like integrated across the web being visited and used by so many people. When you first started going, looking back so many years to its inception, did you ever believe that we'd get here today where it's 5 billion times that people are checking your service? I don't think anyone would use it if I'm honest. I thought it would be me and some of my mates. And part of the reason I built the service was as a bit of a hello world on a bunch of Microsoft Azure infrastructure that I wanted to play with. So I had no expectations. I wouldn't have given it such a stupid name if I did. And now sort of here we are. And what's fascinating about it is every time I look at it and go, holy cow, how did I get here? Suddenly something else happens and it just gets bigger and bigger and bigger. Yeah. It's sort of the same. There's this like age -old piece of like product information or product like advice, I guess, where people are always like, you should build the product that people want, that somebody cares about, that matters to people. In any of your time or any day as sort of the CEO of Have I Been Pwned, has that really sort of exemplified like, oh, this is a service that people actually really care about, really want. And this is something that really is used by a lot of people. Yeah. All the time. And it's the weirdest sort of, I don't know, like light bulb moments. And to give you an example, I think the weirdest ones, I mean, this is a website that its entire purpose here is to index data from data breaches. So it has been criminal activity. People have, a lot of people use the word stolen data from a website. I'm not sure if that's the best word because you steal it, but it still exists. Let's say copied it, unauthorized access. There's a whole bunch of illegal activity. A lot of people gone to jail for the breaches, which have later fallen to Have I Been Pwned. And now I'm in a situation where the FBI is sending me data to load into Have I Been Pwned. The FBI has a fire hose to be able to feed new passwords in, which ultimately end up in that Cloudflare case reserve infrastructure so that other people can query it. I mean, you're querying data from the FBI that they sent to me. And then constantly we see all these law enforcement agencies using it and recommending people use it and lots of use by governments. And they're the sort of, they're the ones that kind of just blow my mind the most because it's all illegally obtained data, but we're always sort of looking at it going, well, this has happened. Now, what are the best things we can do with it to make the world better for everybody else? Yeah, that's, I mean, that's an incredible story. And those, like having that information, those relationships with those people that probably care most about those like password breaches, particularly because they're, you know, arbiters of some of the most sensitive information on the planet is probably a really, really big responsibility. And so I'm glad, you know, they found their way to you in this great service. Yeah, now here we are. Crazy. Yeah. And so over the years and sort of the creation of Have I Been Pwned, have, what sort of growth challenges have you guys faced on that path to 5 billion? Well, it's, it's interesting in many ways that the Pwned Password Service, that's the one that's about hit the 5 billion, that that's kind of been the easiest to fix in so far as that, that the nature of the service means that there's only about a million different possible requests that can be made. And that is a very finite set of data, which we've managed to massively cache at Cloudflare so that we just don't have to worry about what's happening with origin services. That the biggest problem I've faced is the sudden massive organic increase of traffic to the email address search service. And it's interesting, people might think, well, that's, you know, if you're in an online publication, is that what you mean by that? Because that gets a lot of eyeballs. But that's not it. The biggest problem is when it is on primetime TV. And I've had one incident in particular I wrote about in the past, where it was on a very popular show in the UK, it was like Sunday night. And it's funny, I remember beforehand, the company said to me, look, we're going to put you on TV, we have a habit of crashing big websites. And I'm like, haha, I've got the cloud, I'm fine. And this was before it went behind Cloudflare as well. And suddenly it's on TV. And I just pictured like, I don't know, 20 million British people, like all picking up their phones at the same time, and it's on TV and typing in the URL. And my traffic escalated so quickly that the underlying infrastructure just couldn't scale, it couldn't add on instances fast enough to deal with the traffic. And that was sort of a light moment where when I need something else, helping me out here. That organic DDoS. Yeah, it is. And the paradox of it is, that's great, right? Like, that is great to have so much interest in a service. But it's kind of sucky, then when you can't have the service respond and actually give the people what it is that they need. Yeah, absolutely. And so you found your way after, you know, you gain some traction, you're doing media appearances, and people are checking the service, you found your way to Cloudflare, finally. How were your first impressions of the service? What did you like? What didn't you like in those early days? Well, look, I'd used Cloudflare on a couple of other things. And I'd been doing a bunch of training where I was saying to people, and this was, interestingly, in an era where there was still a lot of reticence to move to secure connections. So how certificates are expensive, and they're hard, and all the rest of it. And I remember, I'd run workshops, and I'd go, okay, we're going to spend five minutes, and you're going to get HTTPS for free, and it's going to be easy. And we just get a Cloudflare setup job done. So I'd used it a bunch of times, I think I was using on my blog as well. And the catalyst to actually rolling over into Cloudflare for Have I Been Pwned was that there was an API there that people could query. And if they made too many requests, they'd get an HTTP 429, too many requests. And in my naivety, I was thinking, well, that will fix it because they'll get 429, and then they'll stop. And then they'll wait until they retry after time passes, and it will be fine. Now, it turned out people weren't real happy with having a rate limit. So they just hammered it hard. And the epiphany for me was when my origin service was trying to not only serve legitimate requests, but also respond to excessive requests and do all of these in process in the one app, it just couldn't do it. So that's when it went behind Cloudflare, the rate limit got implemented before the origin. So the only stuff coming through the origin was an acceptable number of requests, and I could sort of control that flow of traffic. So trying to do two things in that origin service was what really killed me at the time. Amazing. Yeah, I mean, it's that sort of stampeding herd, I think is what they call it. I mean, it's really scary whenever it happens to a customer, but so long as there's an API and Cloudflare is serving some percentage of traffic and shielding you, I think it's generally a good situation once everything is set up and optimized on Cloudflare. It is. And sometimes it's the weirdest traffic patterns as well. I tweeted about one very recently where I got one of these emails from Cloudflare, which is like DDoS alert on your website. I was like, okay, I'll go on this, it's curious, I'll go and have a look. The origin website shows no abnormal traffic whatsoever, which is great. And then you go to the Cloudflare dashboard and to the WAF, and it's like a massive spike of requests to the homepage. And I'm like, well, why? Like they just literally get requests to the homepage. And then I kind of went, well, it doesn't matter because it's all cached at Cloudflare. Like you can make 10 times as many requests if you want, it will make absolutely no difference to me because there's something sitting in front, making sure that the stuff that has to do the processing and the hard work on the backend just sees normal traffic. Yeah. I mean, that's the magic of caching. And I love that it's able to scale. You don't really have to worry about it. You could just sort of rest easy knowing that it's going to be there and it's going to just be served from cache. You don't have to worry about any of the unwanted things going back to your origin, being offline, any of these services, not being available to customers. So one of the reasons that I think we're talking today is your implementation of cache reserve. What was your sort of transition like from using our regular CDN to using cache reserve? Oh, I clicked a button. Like that was basically it. That easy? Yeah, it was. And look, for context, this again goes back to this Pwn Passwords feature where we're querying passwords, nearly 5 billion a month. We have made a little bit of a game out of just how high can we get the cache hit ratio because it's a finite set of requests. It's the people that want to do the massive 16 to the power of five, different possible requests, just over 1 million. How much of that can we cache at Cloudflare? Not so much to sort of save cost and all the rest of because you're only serving like 99.1% or something. So you're serving less than 1% of requests, but nearly 5 billion a month is a substantial number. And we wanted to make it better. And then I think I was just literally scrolling around my dashboard that I'm back on right now. And it's like, oh, cache reserve. I wonder what this is. So I had a chat with some Cloudflare folks. They said, look, it's basically you just turn it on. It's going to respect all existing cache headers. It shouldn't change anything else with the tiered caching model that we already had. And what's going to happen is it's just going to sit there in a persistent storage, I think for up to a month if there's nothing hit, and then eventually it'll get evicted. But all that stuff is hit multiple, multiple times a day. So we turned it on and then the origin traffic immediately dropped. And we went from like 99.1% cache hit ratio to 99.999%, which, hey, it's five nines. That's a different context of five nines, but it's awesome to be able to say five nines. And that the value proposition for us with this, probably the most useful thing is for all these organizations that are tying in this service into their registration and their login flows to have the confidence that they're hitting a Cloudflare edge node amongst the hundreds that are spread around the world. And they're getting an almost immediate response. That's enormously valuable for them because it takes out so much of the risk of the origin service, whether it's just the risk of additional latency or the risk of availability, because there's such a high availability on those Cloudflare edge nodes. And of course, it does drive down our costs because we pay for requests to the origin, we pay for egress bandwidth. And perhaps most importantly for me personally, I get that satisfaction of just seeing that number of origin hits go down and down and down. It is such a tiny, tiny amount. You get to tweet about it, which we always love seeing. We're just like, oh, but you added another nine. That's incredible. It's fun, right? Because everyone's looking at it going, how have you done this? And look, we'll never get to zero because the FBI is constantly feeding in new passwords. So when they feed that in, we need to evict from cache, whatever it was that has now changed. So the goal is not necessarily be to zero because the only way that happens is if we don't keep getting new passwords from the FBI. And we want that. We want the new passwords. Absolutely. Keep it going. You did touch on one thing previously that was about how expensive egress can be from origins. And it's something that we blog a lot about and I talk a lot about on a daily basis. But from your perspective, when you are running a website with really tremendous scale, what do those numbers even look like? What are your concerns on that front? Well, I wrote a blog post about this. I think it was early 2022 about how I got pwned by my cloud costs. And I shared some figures in there where, long story short, one particular item in cache had grown beyond the maximum allowable cache size that we had with Cloudflare at the time, which caused the traffic to go back to the egress. And suddenly we were serving something very large from egress. And by the time I realized my bill was five figures for weeks. And I got to pay for that myself. Now, fortunately, a combination of Cloudflare and Microsoft actually helped me out with that. So thank you everybody that looked after me on that one. But it's fascinating when you have large volumes of traffic, how much of a fundamental difference it makes when suddenly you are paying for the egress. So bandwidth is still expensive when it comes from a lot of these cloud platform providers. And often that doesn't factor into your calculations, because I think we just get so accustomed to thinking that bandwidth is quite cheap. But something quite cheap multiplied billions of times over suddenly becomes significant. Yeah, absolutely. And five figures for several weeks doesn't sound too fun to wake up to that bill. Not when you're like one guy and I'm going to go and talk to my wife and go, yeah, funny thing happened. Yeah. Another thing I wanted to touch on with cash reserve and non-cash reserve was latency and any sorts of performance improvements that you've seen with it. Have you experienced some performance benefits from implementing cash reserve? Well, I think the best thing is the people that experience the performance benefits are the ones that are actually consuming the data. I recall years ago, Cloudflare saying that, and someone will get the figures right in this and edit it later on, I'll put it up on the screen. But it was something to the effect of having like 99% of the world's population within 10 milliseconds of a Cloudflare edge node. So if you think about the value proposition of sub 10 millisecond latency tied into a process that might happen synchronously with other things where you're dependent on getting a response from this service, that makes a huge difference for, let's say the login page of a major online asset. So that is just a massively beneficial thing for people using that service. Yeah. Just getting that response back and being able to answer their questions quickly is probably much better than having to wait around for an origin response at the end of the day. Oh, totally, totally. In implementing the cash reserve, generally, did you notice any sort of challenges or limitations with your implementation so far? Are there areas that we should look to be developing more to help more entrepreneurs like yourself? Look, I can't think of any challenges simply because it is like all the mechanics of abstracted away. You've already got items coming back with cache headers that dictate how long they should sit there. You have control over purging them from cache. The way you do that with cache reserve is the same as the way you do that without cache reserve. And I think the only thing that sort of gets added to this that you have to consider is that it is a service that has a cost and you're paying based on the amount of data and read operations and write operations and so on. So I'm just looking at my stats now and I can see that I've got 32 gigabytes worth of data in there and for our current period of usage, well, that's 73 gigabyte days. So you start to have to think about, okay, well, how much data have I got there? How long do I keep it there? How often do I write to it? How often do I read from it? And that's, I think, is the only other thing to consider. But you're offsetting all those origin costs where you've inevitably got services not only serving that up. So I sit on top of Azure Functions. So I pay for the number of executions and the number of megabyte milliseconds that it uses. And then I offset that by paying a little bit, which is a tiny amount compared to the egress bandwidth from before in the amount of data stored in the number of reads and writes. So I think that that's a calculation, particularly for those that really want to sort of analyze costs. That's a calculation you've got to factor in. What do you take away from here and then what do you add on there? Yeah, that's sort of swapping in that, I guess, cost benefit analysis. You're going to be paying something, but probably less at the end of the day. Yeah. And look, at least in my experience, it is a massive difference, largely because egress bandwidth is just so exorbitantly expensive and all of the egress bandwidth costs goes away. And then the processing cost of whatever it is in the origin goes away as well, but you get a little bit of cost on the edge. Yeah. In sort of wrapping up here and thinking about advice that you might have for other website owners, other entrepreneurs looking to build out services on Cloudflare, what would you have them look into first? What would you have them do if they're very new to Cloudflare? Well, I feel it's very much like the same discussion we had a decade ago when I started with Havobanpan. And one of the reasons I started with was I wanted to build like Hello World and Anger because the organization I was working with at the time, I was working at Pfizer, and we were trying to drive sort of a cloud-first adoption. And what I really wanted people there to understand is that we had new paradigms available that meant there are new technologies and new ways of building apps. So I didn't want to just pick up the classic old isp.net apps that we'd running in our on-prem or shared hosting environment and pick them up and move them over because now we've got things like very fast key value stores and online WAFs and everything else. So I wanted everyone to understand there are different ways of working. And I feel like now this is just sort of the next evolution of that where there is a lot of processing that we can do on the edge where before we would have done it on a single origin. We can do a lot more with traffic flow than what we could before. So I think understanding what those modern cloud paradigms are, and they're definitely different to, geez, even five years ago, let alone 10 years ago. Understanding what they are and where the value is, is massively important. I use things like workers really, really extensively because I can do a lot of logic on the edge and very often answer responses or block stuff without having to go to the origin. Learn what those things are. I think workers and cache are probably two of the most valuable things along with WAF, that if you get a good grasp on that, they're going to make a really big difference to your running costs. Yeah, amazing. And I think that's the perfect takeaway. You should be looking at workers, cache, WAF, and then you can, I think, dive in from there to even more products and more tools sort of across the developer platform and really go a long way. And how you, every time I go back into the dashboard, I see something new and I'm like, oh, that's cool. I haven't said, and now I feel like it's the cloud again 10 years ago. Every time I opened the Azure dashboard, I said, oh, I hadn't seen that before. Now I've got something else to learn. So I guess the other takeaway is it is evolving very quickly and there's a lot of new stuff, which I find very exciting. Yeah, it's really the next year or two are really going to be an incredible time of innovation and growth. And we look forward to working with you and seeing how you use the tools to continue to build out Have I Been Pwned for the future. Awesome. Hey, I'm going to keep sharing it too, because I love the reactions on Twitter. And if anyone's got any questions, they want to see stats or figures, tell me, because I publish everything. So just yell out. Yeah. Thank you so much to Troy. And if you want to find him, definitely shout out to him on Twitter. He's really good at engagement there. So we look forward to seeing what he incorporates and builds next. So thank you so much. Awesome. Thanks for having me, Alex. We're betting on the technology for the future, not the technology for the past. So having a broad network, having global companies now running at full enterprise scale gives us great comfort. It's dead clear that no one is innovating in this space as fast as Cloudflare is. With the help of Cloudflare, we were able to add an extra layer of network security controlled by Allianz, including WAF, DDoS. Cloudflare uses CDN and so allows us to keep costs under control and caching and improves speed. Cloudflare has been an amazing partner in the privacy front. They've been willing to be extremely transparent about the data that they are collecting and why they're using it. And they've also been willing to throw those logs away. I think one of our favorite features of Cloudflare has been the worker technology. Our origins can go down and things will continue to operate perfectly. I think having that kind of a safety net provided by Cloudflare goes a long ways. We were able to leverage Cloudflare to save about $250,000 within about a day. The cost savings across the board is measurable, it's dramatic, and it's something that actually dwarfs the yearly cost of our service with Cloudflare. It's really amazing to partner with a vendor who's not just providing a great enterprise service, but also helping to move forward the security on the Internet. One of the things we didn't expect to happen is that the majority of traffic coming into our infrastructure would get faster response times, which is incredible. Zendesk just got 50% faster for all of these customers around the world because we migrated to Cloudflare. We chose Cloudflare over other existing technology vendors so we could provide a single standard for our global footprint, ensuring world-class capabilities in bot management and web application firewall to protect our large public-facing digital presence. We ended up building our own fleet of HAProxy servers such that we could easily lose one and then it wouldn't have a massive effect. But it was very hard to manage because we kept adding more and more machines as we grew. With Cloudflare we were able to just scrap all of that because Cloudflare now sits in front and does all the work for us. Cloudflare helped us to improve the customer satisfaction. It removed the friction with our customer engagement. It's very low maintenance and very cost effective and very easy to deploy and it improves the customer experiences big time. Cloudflare is amazing. Cloudflare is such a relief. Cloudflare is very easy to use. It's first. Cloudflare really plays the first level of defense for us. Cloudflare has given us peace of mind. They've got our backs. Cloudflare has been fantastic. I would definitely recommend Cloudflare. Cloudflare is providing an incredible service to the world right now. Cloudflare has helped save lives through Project Fairshot. We will forever be grateful for your participation in getting the vaccine to those who need it most in an elegant, efficient, and ethical manner. Thank you. Q2's customers love our ability to innovate quickly and deliver what was traditionally very static old-school banking applications into more modern technologies and integrations in the marketplace. Our customers are banks, credit unions, and fintech clients. We really focus on providing end-to-end solutions for the account holders throughout the course of their financial lives. Our availability is super important to our customers here at Q2. Even one minute of downtime can have an economic impact. So we specifically chose Cloudflare for their Magic Transit solution because it offered a way for us to displace legacy vendors in the Layer 3 and Layer 4 space, but also extend Layer 7 services to some of our cloud-native products and more traditional infrastructure. I think one of the things that separates Magic Transit from some of the legacy solutions that we had leveraged in the past is the ability to manage policy from a single place. What I love about Cloudflare for Q2 is it allows us to get 10 times the coverage as we previously could with legacy technologies. I think one of the many benefits of Cloudflare is just how quickly the solution allows us to scale and deliver solutions across multiple platforms. My favorite thing about Cloudflare is that they keep development solutions and products. They keep providing solutions. They keep investing in technology. They keep making the Internet safe. Security has always been looked at as a friction point, but I feel like with Cloudflare it doesn't need to be. You can deliver innovation quickly, but also have those innovative solutions be secure. The About You fashion platform has become the number one fashion platform in Europe in the Generation Y and Z. It has been tremendously successful because we have built the technology stack from a commerce perspective, then decided to also make it available to leading fashion brands such as Marco Polo, Tom Taylor, The Founded, and many others. And that's how scale was born. What we see in the market is that the attack vectors are becoming increasingly more scaled, distributed, and complex as a whole. We decided to bring on Cloudflare to ultimately have the best possible security tech stack in place to protect our brands and retailers. We use the Cloudflare bot management, rate limiting, and WAF as an extra layer of protection for our customers by tackling the major cyber threats that we see in the market. DDoS attacks, credential stuffing at scalping bots. What we see with a scalping bot here is that they are targeting high-end products and then buying them up within a few seconds. That leaves the customer dissatisfied. They will turn away and purchase somewhere else the product and thereby we have lost the customer. Generally before it could take maybe up to half an hour for a security engineer to handle DDoS attacks. Now we are seeing that Cloudflare could help us to stop that in an automatic way. Cloudflare helps us to bring the site performance to the best and ultimately therefore create even more revenue with our clients. you