Developer Speaker Series: Fireside chat with Troy Hunt and Alex Krivit
Presented by: Troy Hunt, Alex Krivit
Originally aired on June 21, 2023 @ 1:30 AM - 2:00 AM EDT
Join Alex Krivit, Product Manager at Cloudflare, as he sits down for a Fireside Chat with Troy Hunt, the founder of Have I Been PWned. Hear how Cache Reserve is helping Troy serve over 99% of requests from cache.
English
Developer Speaker Series
Transcript (Beta)
Cool. Hello, everybody out there. My name is Alex. I'm a product manager here at Cloudflare for CDN and caching.
I'm joined today by Troy Hunt, who is the CEO and co -founder, co-founder?
Founder I think. Founder, that's me. Just the founder. All right.
Have I been pwned? And we're going to talk to you about using Cloudflare services and products to scale websites that are seen and viewed by millions and millions of people.
So I think maybe as just maybe a point of introduction here, Troy, do you want to kick it off and tell us a little bit about yourself and what you do?
Yeah, sure. So I guess as it relates to the discussion here, I started a little project, was a little project called Have I Been Pwned?
Almost 10 years ago at the time of recording, it was December 2013.
And it was designed to be a data bridge aggregation service, 150 plus million email addresses in there that you could search through from various breaches, most notably Adobe.
And then it just grew over time.
And now there's about 12 and a half billion unique records in there.
There's also, I think about a billion passwords. I don't know exactly because more keep going in there.
And I guess particularly as it relates to Cloudflare, all the traffic go through Cloudflare and that pwned passwords one, the one to actually search for known bad passwords.
Look at my stats now, we're approaching 5 billion requests in a month now, that service.
So this little thing did grow up.
I think it's fair to say. So it's safe to say 5 billion people are requesting access to your service that shows people about their leaked credentials, their leaked passwords and things online.
Well, I think we've got to give a bit of context to it.
So there's really two parts here. One is you've got to have a been pwned.
There's a front page, you can check your email address against data breaches, has it appeared there.
And then separately to that, there is a part of the service called pwned passwords, where you can check to see if a password has been breached.
And what's really interesting about that feature is there's an API that sits in front of it.
There's an anonymity model, which came courtesy of Cloudflare some years ago.
And that's now something that's built into the registration flow, the login flow, the password reset flow of a lot of really major online services to the point now with almost 5 billion times a month, we're at like 4.8 billion something every month.
There are requests to see whether a password has been sent a data breach.
And what organizations are trying to do here is stop people from using known bad passwords, because that reduces account takeover attacks and all the other problems that come with that.
That's fascinating.
So I mean, it's obviously a service that's been sort of like integrated across the web being visited and used by so many people.
When you first started going, looking back so many years to its inception, did you ever believe that we'd get here today, where it's 5 billion times that people are checking your service?
I don't think anyone would use it, if I'm honest. I thought it would be me and some of my mates.
And part of the reason I built the service was as a bit of a hello world on a bunch of Microsoft Azure infrastructure that I wanted to play with.
So I had no expectations. I wouldn't have given it such a stupid name if I did.
And now sort of here we are. And what's fascinating about it is, is every time I look at it and go, holy cow, how did I get here?
Suddenly something else happened and it just gets bigger and bigger and bigger.
Yeah.
It's sort of the same. There's this like age old piece of product information or product advice, I guess, where people are always like, you should build the product that people want, that somebody cares about, that matters to people.
In any of your time or any day as sort of the CEO of Have I Been Pwned, has that really sort of exemplified like, oh, this is a service that people actually really care about, really want.
And this is something that really is used by a lot of people. Yeah, all the time.
And it's the weirdest sort of, I don't know, like light bulb moments.
And to give you an example, I think the weirdest ones, I mean, this is a website that its entire purpose here is to index data from data breaches.
So there's been criminal activity.
People have, a lot of people use the word stolen data from a website.
I'm not sure if that's the best word because you steal it, but it still exists.
Let's say copied it, unauthorized access. There's a whole bunch of illegal activity.
A lot of people going to jail for the breaches, which have later fallen to Have I Been Pwned.
And now I'm in a situation where the FBI is sending me data to load into Have I Been Pwned.
The FBI has a fire hose to be able to feed new passwords in, which ultimately end up in that Cloudflare case reserve infrastructure so that other people can query it.
I mean, you're querying data from the FBI that they sent to me.
And then constantly we see all these law enforcement agencies using it and recommending people use it and lots of use by governments.
And they're the sort of, they're the ones that kind of just blow my mind the most because it's all illegally obtained data, but we're always sort of looking at it going, well, this has happened.
Now, what are the best things we can do with it to make the world better for everybody else?
Yeah, that's, I mean, that's an incredible story.
And those, like having that, that information, those relationships with those people that probably care most about those, like password breaches, particularly because they're, you know, arbiters of some of the most sensitive information on the planet is probably a really, really big responsibility.
And so I'm glad, you know, they found their way to you in this great service.
Yeah, now here we are. Crazy. Yeah. And so over the years and sort of the creation of Have I Been Pwned, have, what sort of growth challenges have you guys faced on that path to 5 billion?
Well, it's, it's interesting in many ways that the Pwned Password Service, that's the one that's about to hit the 5 billion, that that's kind of been the easiest to fix insofar as that the nature of the service means that there's only about a million different possible requests that can be made.
And that is a very finite set of data, which we've managed to massively cache at Cloudflare so that we just don't have to worry about what's happening with origin services.
That the biggest problem I've faced is the sudden massive organic increase of traffic to the email address search service.
And it's interesting, people might think, well, that's, you know, if you're in an online publication, is that what you mean by that?
Because that gets a lot of eyeballs. But that's not it.
The biggest problem is when it is on primetime TV. And I've had one incident in particular, I wrote about in the past where it was on a very popular show in the UK, it was like Sunday night.
And it's funny, I remember beforehand, the company said to me, look, we're going to put you on TV, we have a habit of crashing big websites.
And I'm like, haha, I've got the cloud, I'm fine. And this was before it went behind Cloudflare as well.
And suddenly it's on TV. And I just pictured like, I don't know, 20 million British people like all picking up their phones at the same time, and it's on TV and typing in the URL.
And my traffic escalated so quickly that the underlying infrastructure just couldn't scale, it couldn't add on instances fast enough to deal with traffic.
And that was sort of a lightbulb moment where when I need something else, helping me out here.
That organic DDoS. Yeah, it is.
And the paradox of it is that's great, right? Like that is great to have so much interest in a service.
But it's kind of sucky then when you can't have the service respond and actually give the people what it is that they need.
Yeah, absolutely.
And so you found your way after you gained some traction, you're doing media appearances and people are checking the service, you found your way to Cloudflare finally.
How were your first impressions of the service? What did you like?
What didn't you like in those early days? Well, look, I'd used Cloudflare on a couple of other things.
And I'd been doing a bunch of training where I was saying to people, and this was interestingly in an era where there was still a lot of reticence to move to secure connections.
So certificates are expensive and they're hard and all the rest of it.
And I remember I'd run workshops and I'd go, okay, we're going to spend five minutes and you're going to get HTTPS for free and it's going to be easy.
And we just go to the Cloudflare setup, job done. So I'd used it a bunch of times.
I think I was using it on my blog as well. And the catalyst to actually rolling over into Cloudflare for Have I Been Pwned was that there was an API there that people could query.
And if they made too many requests, they'd get an HTTP 429, too many requests.
And in my naivety, I was thinking, well, that will fix it because they'll get 429 and then they'll stop.
And then they'll wait until they retry after time passes and it will be fine.
Now, it turned out people weren't real happy with having a rate limit.
So they just hammered it hard. And the epiphany for me was when my origin service was trying to not only serve legitimate requests, but also respond to excessive requests and do all of these in process in the one app, it just couldn't do it.
So that's when it went behind Cloudflare, the rate limit got implemented before the origin.
So the only stuff coming through the origin was an acceptable number of requests and I could sort of control that flow of traffic.
So trying to do two things in that origin service was what really killed me at the time.
Amazing. Yeah. I mean, it's that sort of stampeding herd, I think is what they call it.
And so that's, I mean, it's really, I mean, scary whenever it happens to a customer, but so long as there's an API and Cloudflare is serving some percentage of traffic and shielding you, I think it's generally a good situation once everything is set up and optimized on Cloudflare.
It is. And you know, it's sometimes it's the weirdest traffic patterns as well.
I tweeted about one very recently where I got one of these emails from Cloudflare, which is like DDoS alert on your website.
I was like, okay, I'll go on this.
It's curious. I'll go and have a look. The origin website shows no abnormal traffic whatsoever, which is great.
And then you go to the Cloudflare dashboard and to the WAF and it's like a massive spike of requests to the homepage.
And I'm like, well, why?
Like they just literally get requests to the homepage. And then I kind of went, well, it doesn't matter because it's all cached at Cloudflare.
Like you can make 10 times as many requests if you want.
It will make absolutely no difference to me because there's something sitting in front, making sure that the stuff that has to do the processing and the hard work on the backend just sees normal traffic.
Yeah. I mean, that's the magic of caching. And I love that it's able to scale.
You don't really have to worry about it. You could just sort of rest easy knowing that, you know, it's going to be there and it's going to just be served from cache.
You don't have to worry about any of the unwanted things going back to your origin, being offline and any of these services, you know, not being available to customers.
So one of the reasons that I think we're talking today is your implementation of cash reserve.
What was your sort of transition like from using our regular CDN to using cash reserve?
Oh, I clicked a button.
That was basically it. That easy? Yeah, it was. And look, for context, this again goes back to this Pwn Passwords feature where we're querying passwords, nearly 5 billion a month.
We have made a little bit of a game out of just how high can we get the cache hit ratio.
Because it's a finite set of requests, it's the people that want to do the maths, it's 16 to the power of 5, different possible requests, just over 1 million.
How much of that can we cache at Cloudflare?
Not so much to sort of save cost and all the rest of it on the origin, because you're only serving, we're at like 99.1% or something.
So you're serving less than 1% of requests, but nearly 5 billion a month, a substantial number.
And we wanted to make it better.
And then I think I was just literally scrolling around my dashboard that I'm back on right now.
And it's like, oh, I wonder what this is. So I had a chat to some Cloudflare folks.
They said, look, it's basically, you just turn it on.
It's going to respect all your existing cache headers. It shouldn't change anything else with the tiered caching model that we already had.
And what's going to happen is it's just going to sit there in a persistent storage, I think for up to a month, if there's nothing hit, and then eventually it'll get evicted.
But all that stuff is hit multiple, multiple times a day. So we turned it on and then the origin traffic immediately dropped.
And we went from like 99.1% cache hit ratio to 99.999%, which it's five nines.
So different contexts of five nines, but it's awesome to be able to say five nines.
And that the value proposition for us with this, probably the most useful thing is for all these organizations that are tying in this service into their registration and their login flows to have the confidence that they're hitting a Cloudflare edge node amongst the hundreds that are spread around the world.
And they're getting an almost immediate response.
That's enormously valuable for them because it takes out so much of the risk of the origin service, whether it's just the risk of additional latency or the risk of availability, because there's such a high availability on those Cloudflare edge nodes.
And of course, it does drive down our costs because we pay for requests to the origin, we pay for egress bandwidth.
And perhaps most importantly for me personally, I get that satisfaction of just seeing that number of origin hits go down and down and down.
It is such a tiny, tiny amount.
You get to tweet about it, which we always love seeing. We're just like, oh, but you added another nine.
That's incredible. It's fun, right? Because everyone's looking at it going, how have you done this?
And look, we'll never get to zero because the FBI is constantly feeding in new passwords.
So when they feed that in, we need to evict from cache, whatever it was that has now changed.
So the goal is not actually to zero because the only way that happens is if we don't keep getting new passwords from the FBI, and we want that.
We want the new passwords.
Absolutely. Keep it going. You did touch on one thing previously that was about how expensive egress can be from origins.
And it's something that we blog a lot about, and I talk a lot about on a daily basis.
But from your perspective, when you are running a website with really tremendous scale, what do those numbers even look like?
What are your concerns on that front?
Well, I wrote a blog post about this. I think it was early 2022 about how I got pwned by my cloud costs.
And I shared some figures in there where, long story short, one particular item in cache had grown beyond the maximum allowable cache size that we had with Cloudflare at the time, which caused the traffic to go back to the egress.
And suddenly we were serving something very large from egress. And by the time I realized my bill was five figures for weeks.
And I got to pay for that myself.
Now, fortunately, a combination of Cloudflare and Microsoft actually helped me out with that.
So thank everybody that looked after me on that one.
But it's fascinating when you have large volumes of traffic, how much of a fundamental difference it makes when suddenly you are paying for the egress.
So bandwidth is still expensive when it comes from a lot of these cloud platform providers.
And often that doesn't factor into your calculations because I think we just get so accustomed to thinking that bandwidth is quite cheap.
But something quite cheap multiplied billions of times over suddenly becomes significant.
Yeah, absolutely.
And five figures for several weeks doesn't sound too fun to wake up to that bill.
Not when you're like one guy and then I'm going to go and talk to my wife and go, yeah, funny thing happened.
Yeah. Another thing I wanted to touch on with cash reserve and non-cash reserve was latency and any sorts of performance improvements that you've seen with it.
Have you experienced some performance benefits from implementing cash reserve?
Well, I think the best thing is the people that experience the performance benefits are the ones that are actually consuming the data.
I recall years ago, Cloudflare saying that, and someone will get the figures right in this and edit it later on, I'll put it up on the screen.
But it was something to the effect of having like 99% of the world's population within 10 milliseconds of a Cloudflare edge node.
So if you think about the value proposition of sub 10 millisecond latency tied into a process that might happen synchronously with other things where you're dependent on getting a response from this service, that makes a huge difference for, let's say the login page of a major online asset.
So that is just a massively beneficial thing for people using that service.
Yeah. Just getting that response back and being able to answer their questions quickly is probably much better than having to wait around for an origin response at the end of the day.
Oh, totally, totally.
In implementing the cash reserve, generally, did you notice any sort of challenges or limitations with your implementation so far?
Are there areas that we should look to be developing more to help more entrepreneurs like yourself?
Look, I can't think of any challenges simply because it is like all the mechanics of abstracted away.
You've already got items coming back with cache headers that dictate how long they should sit there.
You have control over purging them from cache.
The way you do that with cache reserve is the same as the way you do that without cache reserve.
And I think the only thing that sort of gets added to this that you have to consider is that it is a service that has a cost and you're paying based on the amount of data and read operations and write operations and so on.
So I'm just looking at my stats now and I can see that I've got 32 gigabytes worth of data in there and for our current period of usage, well, that's 73 gigabyte days.
So you start to have to think about, okay, well, how much data have I got there?
How long do I keep it there? How often do I write to it? How often do I read from it?
And that's, I think, is the only other thing to consider. But you're offsetting all those origin costs where you've inevitably got services not only serving that up.
So I sit on top of Azure Functions. So I pay for the number of executions and the number of megabyte milliseconds that it uses.
And then I offset that by paying a little bit, which is a tiny amount compared to the egress bandwidth from before in the amount of data stored in the number of reads and writes.
So I think that that's a calculation, particularly for those that really want to sort of analyze costs.
That's a calculation you've got to factor in.
What do you take away from here? And then what do you add on there? Yeah, that's sort of swapping in that, I guess, cost benefit analysis.
You're going to be paying something, but probably less at the end of the day.
Yeah. And look, at least in my experience, it is a massive difference, largely because egress bandwidth is just so exorbitantly expensive.
And all of the egress bandwidth costs goes away.
And then the processing cost of whatever it is in the origin goes away as well.
But you get a little bit of cost on the edge. Yeah. Yeah. In sort of wrapping up here and thinking about advice that you might have for other website owners, other entrepreneurs looking to build out services on Cloudflare, what would you have them look into first?
What would you have them do if they're very new to Cloudflare?
Well, I feel it's very much like the same discussion we had a decade ago when I started with Have I Been Pwned?
And one of the reasons I started with was I wanted to build like Hello World and Anger because the organization I was working with at the time, I was working at Pfizer, and we were trying to drive sort of a cloud-first adoption.
And what I really wanted people there to understand is that we had new paradigms available that meant there are new technologies and new ways of building apps.
So I didn't want to just pick up the classic old ASP .NET apps that we're running in our on -prem or shared hosting environment and pick them up and move them over because now we've got things like very fast key value stores and online WAFs and everything else.
So I wanted everyone to understand there are different ways of working.
And I feel like now this is just sort of the next evolution of that where there is a lot of processing that we can do on the edge where before we would have done it on a single origin.
We can do a lot more with traffic flow than what we could before.
So I think understanding what those modern cloud paradigms are, and they're definitely different to, geez, even five years ago, let alone 10 years ago.
Understanding what they are and where the value is, is massively important.
I use things like workers really, really extensively because I can do a lot of logic on the edge and very often answer responses or block stuff without having to go to the origin.
Learn what those things are.
I think workers and cache are probably two of the most valuable things along with WAF, that if you get a good grasp on that, they're going to make a really big difference to your running costs.
Yeah, amazing. And I think that's the perfect takeaway.
You should be looking at workers, cache, WAF, and then you can, I think, dive in from there to even more products and tools across the developer platform and really go a long way.
And every time I go back into the dashboard, I see something new and I'm like, oh, that's cool.
And now I feel like it's the cloud again 10 years ago.
Every time I opened the Azure dashboard, I was like, oh, I hadn't seen that before.
Now I've got something else to learn. So I guess the other takeaway is it is evolving very quickly and there's a lot of new stuff, which I find very exciting.
Yeah, the next year or two are really going to be an incredible time of innovation and growth.
And we look forward to working with you and seeing how you use the tools to continue to build out Have I Been Pwned for the future.
Awesome. Hey, I'm going to keep sharing it too, because I love the reactions on Twitter.
And if anyone's got any questions, they want to see stats or figures, tell me, because I publish everything.
So just yell out. Yeah. Thank you so much to Troy.
And if you want to find him, definitely shout out to him on Twitter.
He's really good, good at engagement there. So, you know, we look forward to seeing what he what he incorporates and builds next.
So thank you so much.
Awesome. Thanks for having me, Alex.