🔬 MIGP and Password Study
Presented by: Tara Whalen, Luke Valenta
Originally aired on July 11, 2022 @ 4:00 PM - 4:30 PM EDT
Join our research team as they discuss MIGP and Password Study.
Read the blog posts:
English
Research
Transcript (Beta)
Hi, welcome to our Cloudflare TV segment, where we're going to be talking about passwords and password security and what Cloudflare is doing in this space.
I'm Luke Valenta.
I'm a systems engineer on the research team. And today I have with me Tara Whalen, who's one of the research leads on the team and our resident expert in privacy.
So today we're going to be talking about, we had a couple of blog posts come out this morning talking about our work on passwords and password security.
And throughout this whole week, we've had a whole bunch of blog posts coming out from the research team and others at Cloudflare that, yes, I encourage you to go take a look at all of these blog posts.
There's a lot of really interesting content. And yeah, I think it's very good read.
So we have the two blog posts that we're talking about in this segment today linked in the video description.
So you can go take a look and you might want to read through as you're listening to this segment.
And then if you have any questions along the way as we're talking, feel free to email them in and we'll get to them at the end if there's time.
So now to just kick things off, Tara, could you talk a little bit about password security and data breaches and why we should care about them?
Sure, Luke. So we all know that our security systems are sadly quite imperfect.
I think we see news items every week, maybe at this stage it's every day probably about some kind of a data breach.
There's been a leak with a bunch of sensitive data has been exposed.
Now this can often include password data.
So usernames and passwords that can be used to access systems. And of course these can be very valuable systems that attackers want to get their hands on.
Now, when this happens, you may have gotten a notification, may have gotten an email from something that you have an account on saying, you need to reset your password.
So we've had a breach, we've locked down your account, please go in and reset your password.
So you go and do that and you've taken care of that one exposure.
But we know that lots of people reuse the same passwords across different systems.
You've changed it in one place, but you may have used it in a different place.
And so that though still makes it a vulnerability. But I'm gonna take a moment though, since we're talking about password reuse to talk a little bit about usability.
So I have a particular interest in the usability of privacy and security and how difficult it is for people to use some of these systems.
And so we know that reusing passwords, it's a bad idea, like from a security perspective.
And we'll go into that, a few details on that later.
It's really understandable from a convenience perspective.
I mean, who wants to have to manage a pile of very long, unique passwords for all of these systems?
It's too much for you to remember. There's no connection to any of this.
It's a burden. So the reason why people are doing this is it's really human nature.
This is a burden for folks and they don't want to have to do that.
But the problem is that this makes it easier for attackers to get into a lot of your accounts.
So there's this idea of credential stuffing attacks. So what happens is attackers will launch automated guessing attacks.
So what they've figured out is probably people reuse the same username and password on multiple systems.
So they found it valid in one place. Then they figure, hmm, that's probably valid at a different service.
So they just try that. They try the same set across a bunch of different systems and they hope they get lucky, which they often do because people are reusing the passwords.
And in fact, they can be a little more clever about this.
So they don't necessarily just use the same password, but they change it just a tiny bit.
Because we know, for example, people are told they have to change their password.
Now again, if you're trying to remember it and you have to make a change, you might just make a very simple change.
You might stick a number at the end of the password or put an exclamation mark at the end to put some punctuation in.
So these are all the things people do when they're forced to change to a new password.
So they do sort of the minimum possible change to make the password checking system happy.
Well, sure, but it also makes the attacker very happy.
So you could have your Facebook password, which is like sunshine FB for Facebook.
And then, oh, you also use sunshine IG for Instagram.
So you're like, well, I can remember those. Well, an attacker can easily add that kind of variant to their tool, little changes, adding a number or punctuation or lettering and try those too.
So that's not really an obstacle for keeping somebody out of your accounts.
These attacks are really fascinating, but scary.
So what are some of the solutions for users to keep their passwords secure that they're being tried out?
So there's often a lot of advice for people about passwords.
I'm gonna highlight first something we sort of consider to be a suboptimal solution.
So you've probably seen advice about using sort of uppercase, lowercase letters, digits, special characters.
Now on their own, that's not a problem.
If that's the only thing you do is just having short passwords with a few differences in the character set.
This is not considered to be sufficient. So NIST, so one of the government bodies in the US that does a lot of work around a good security practice in their latest password guidance recommends against this.
Or what they said was that you should require length if you're doing enforcements of password policy, require the length, that's more important and not this funny little password complexity part.
And the reason why they're saying is that the complexity, like over, if you really need the longer password.
So it's not the adding the punctuation and those things that makes it stronger, particularly because anything we're a human being makes this password.
We tend to be predictable where human beings are very predictable.
It makes it easy to do guessing because we don't generate very good passwords.
So instead they just say, really what you wanna do is have the length in order to thwart these cracking attacks.
That's the most important thing to optimize for.
And it's also a good reminder, security advice does evolve.
So you're told a lot of these things as a user but sometimes that advice doesn't hold.
It systems change, but also sometimes the advice wasn't very good advice to begin with, frankly.
So instead there are some better solutions. So again, I talked about the difficult to remember.
Well, of course there are password managers that we can deploy for this.
So for that, it allows you to keep track of a lot of different passwords so that you don't have to reuse your passwords because you don't have to remember.
They will help you generate strong passwords. So you can certainly fill them up with very long passwords.
It'll make one for you. It's not as predictable because it's just throwing down some characters rather than using a bunch of dictionary words.
So that's great, but they're not really universal yet.
They're absolutely getting more popular, particularly since they were integrated into browsers.
So people can just do that while they're web browsing and while they're doing a login, many of them will offer to actually save the password for you or do generate a password for you.
So they're absolutely making things more convenient for people, but they're not at a hundred percent usage yet.
There's also multi-factor authentication. So the idea where, yes, you have to provide the password but you have to supplement that with some extra information.
So you might need a hardware security key. You may have to provide a special authentication code that you might get from an app, for example.
So you need two of those in order to have a successful login or two or more.
Now, this may require you though, you might need specialized hardware in order to do this.
You need to have the particular app and people might not have those at their disposal yet.
And from the perspective of the sites, they don't really want user friction.
So again, if there's more login steps, if they have to actually register a key, some services don't want to make people do more work.
So they are hesitant to add this extra security even though it has a security payoff because they don't want to discourage users from using their systems.
So these are, but these are incredibly useful.
They're incredibly effective. They have definitely made the security ecosystem a lot better and they're necessary if there's going to be any kind of barrier for the breach passwords.
The breach passwords alone then become pretty useless if this additional information is missing.
So it really is incredibly effective for that.
Because the attacker, for example, doesn't have your security key.
So that actually is a pretty effective barrier but it means we still have gaps.
Again, these are not universal. So we still have gaps to protect against. And I would say that, for example, one of the things we can deploy is the credential checking service.
That's another complimentary approach. And I was hoping, Luke, you could tell me a bit about how the credential checking services work.
Yeah, so in this world where we still have all of these weak passwords going around where password managers aren't universally deployed, we need, users need a way for, to be able to detect if their passwords are actually in some data breach and are compromised because then they know that their passwords are likely a target of some attacker doing credential stuffing or credential tweaking attacks.
So that's what credential checking services do.
They make it easy for users to check if their passwords are part of a data breach.
So they basically do the same thing that attackers would do in downloading all of these breach datasets from finding the torrent links for them on hacker forums, et cetera.
They do all the hard work up front, download all these datasets, parse them, and then make them available in a format that's easy for users to query to check if their credentials are in that dataset.
So sort of the canonical example of a compromised credential checking service is Have I Been Pwned?
This was really the first public service that was launched back in 2013.
And it allows users to, yeah, to check if their credentials are part of this set of, I think it's in the hundreds of millions or billions of leaked credentials.
So the way that a lot of these credential checking services work is that they require the user to basically send in the, well, yeah, so one way that these could work is the user just sends their credentials to the service directly.
The service checks if it's in the database of credentials and says, yeah, you're good, or no, I've found your credentials are actually compromised.
But doing the sort of straw man approach of sending the credentials to the server directly actually might lead to your credentials being in the next data breach if that credential checking services is itself compromised.
So the way that a lot of, like Have I Been Pwned works is it actually just requires users to send a hash of their credentials to the server.
And the server checks this cryptographic hash against the database and checks, and it can tell the user if the credentials are in the database.
And you can do a little bit better than that. You can actually do this with just the prefix hash of the credentials.
So you only send the first few bytes of the hash to the server, and then the server just sends you a bucket of all of the credentials in the database that match that prefix hash.
And then the user can, on their side, run the check over that list of entries in the bucket to see if their credentials are actually compromised.
That is very interesting.
So what can you tell us about what Cloudflare Research has been doing to advance work in the credential checking space?
Yeah, so here on the screen, I'm showing the Might I Get Pwned protocol, which is, this is what we talk about in one of the blog posts that went out today.
This is this protocol that's developed by academic researchers, and we've been working with them to prototype and deploy it within Cloudflare's infrastructure.
So this service is a next-generation compromised credential checking service that has some very good privacy guarantees.
So basically, rather than sending a hash of your, or prefix hash of your password to the credential checking service, you only need to send the prefix hash of your username.
And then the service, the Might I Get Pwned server, sends you back a bucket of all encrypted cipher texts or encrypted entries corresponding to that username.
And the client and the server run together this, it's called an oblivious or random function.
There is this protocol to where the client can derive a key without leaking any information to the server, and they, that, and then they can use that key to decrypt one of the entries in that bucket that the server sends back, where the client doesn't learn any secrets on the server side, and the server doesn't learn any secrets of the client.
So the, I guess the, so that's one of the advantages of Might I Get Pwned.
It has this extra privacy guarantee that only the username, only some bits of the username are leaked to the server and nothing from the password.
And the other main advantage is that it also supports for checking password variants.
So here in this diagram, we have the database of breach credentials, and the server computes a bunch of variants of each entry in that database.
So you can take the password, so the leaked password is pass one here, and the server computes variants.
So maybe remove the last character, or if it's a number, increment it or add an exclamation point at the end.
And it inserts all of these entries into the database as well, and then puts them all into a bucket corresponding to the hash prefix of the username.
And then these buckets are what the client eventually retrieves from the server.
So with this variant support, clients can help to protect themselves against credential tweaking attacks, where they're essentially doing the same thing that attackers would do.
They're just modifying their credentials slightly and checking if those work, or if those are in the compromised dataset.
So now let's, since we built this, let's go walk through a demo of what this service looks like.
So you can find the link to this in the blog post from today.
It's migp, or might I get pwned, .Cloudflare.com.
And you can input credentials to check. So here's, so this is username we wanna check, and then let's check this, if username one at example.com and password one are compromised.
So we submit them to the server, and it looks like the password is in the data breach, is the response we get.
So if we look at the actual messages that are sent in this interaction between the client and the server, the client sends this bucket ID, which is the hash prefix of the username.
So this shouldn't depend on the password at all. And then it sends this blind element, which is a part of that protocol I was talking about, which helps the client to derive the key with which it can derive, with which it can decrypt one of those bucket entries.
So then the server responds back with the evaluated element, which is the next step in that protocol for the, to allow the client to derive the key, as well as a bucket of encrypted breach entries.
So the client uses that key and tries to decrypt every element in this bucket of entries to see if one of the elements decrypts.
And in that case, the client knows that the credentials it's querying are actually compromised.
So now let's look at, let's say we wanna check, I guess, to secure our password, we just remove one of the characters from it.
So this is now our password variant, just password instead of password one.
If we check those against the service, it responds similar password and breach.
So this is one of the pre -computed password variants that our MIGP server has.
And so it can tell us that this tweaked password is also compromised and likely vulnerable to a credential tweaking attack.
And in this case, the client sends, it requests the exact same bucket as before.
It only depends on the username.
And so the server sends back the exact same response. It's just the blind element and evaluated element aspects of this are different to allow the client to derive this new bucket entry corresponding to the similar password.
And then if we, let's say, try some other, I don't know, some other password like this, we can see that, yeah, this password was not found in the dataset.
It's not a simple variant of the other ones that was of password one that was detected by the server.
And you can also check for just a username.
So in this case, you just check for a username and an empty password.
And we see that the username was indeed in the data breach. So we've open-sourced the code for this as well.
So it's now on GitHub in the Cloudflare organization, migpigo.
So feel free to take a look through the code. You can email us at askresearch at Cloudflare .com if you have any questions about it.
Feel free to file issues.
And so these are a couple of issues that I filed this morning for adding extra features to the library that are talked about in the MITRE Get Pwned paper that's online and also linked from the blog posts.
So now that we have this implementation and demo, we're, I guess, making progress towards securing all of these weak passwords out there.
But Tara, could you talk a little bit more about what other things or what are future directions for this project and other things that the research team is looking at in the space of password security?
Sure, Luke.
I should highlight that this is some things we've talked about in one of the blog posts that went live today about sort of the ongoing and future work around passwords and password security.
So MIGP, really great, giving a signal about whether the password is in a known breach or not.
But of course, this is only one piece of data.
We can actually, of course, combine this with other things that we're seeing in our systems, look at multiple login attempts about different coordinated behaviors to get a richer picture of what might be happening for an attack.
So to give one example to motivate this, so what if, for instance, there has been a breach, but it's not widely known yet?
So you don't know that you've been breached, but the attacker has a set of passwords and is using those.
Now, if someone were to test that against one of these credential checking services, it would say that it's not in the breach because it's not in one of their known datasets, but it's certainly exposed and vulnerable.
So you might want to be looking at other signals that suggest that an attack might be happening so you can get ahead of that sort of and respond to that attack, shut things down and secure your systems.
So there's work on analysis on a bunch of the login data that is happening with our intramariner Sanusi, who's doing a lot of deep analysis on login data that we have here at Cloudflare, along with a lot of the other information that we collect and that we monitor as a very proactive security company, where we might want to look at things like failed login attempts.
So if there are a higher than usual number of failed login attempts, just happening across the system or happening for a range of accounts, that's probably a pretty strong signal that something malicious might be happening.
You could be looking at the timing of attempts. Now, the nature of an automated attack is not necessarily, of course, the same as anything that a human being would be doing.
So you're trying to, again, make the distinction between there's absolutely an attack happening versus very different from a person who makes some typos who's just trying their account because maybe they can't quite remember.
Those are going to look very different. So again, you might have faster timing.
You might have repeated attempts across a huge range of accounts from the same IP address, which is unlikely to be a legitimate attempt at a login.
We can look at things like the bot score. So Cloudflare has, of course, bot management.
So you're looking to see if a request originates, the likelihood it's from a bot, something automated, different from a human.
And that might be suggesting, again, this is an automated attack.
So you could take a bunch of these elements and then correlate them and look at things like hits on might I get pwned.
So again, if you're seeing a bunch of hits and they're happening repeatedly, well, that maybe suggests someone is walking their way through a database, for example, and that would tell you that information.
And then you could look at those other elements and saying that gives you some confidence that the things that you're seeing as the other signals are also strong indicators in a sense, almost independently from the credential checking.
And so we might want to develop, I guess, sort of an attack fingerprint, and then you could use that information and try to find ways to make it useful for someone who's a security analyst, for example, who's trying to do incident response, like how you can bubble up that information, how you can make that actionable so that they can do the appropriate amount of response to sort of cut that attack off and do the response properly.
So we're doing a lot of analysis right now with this login data. So again, stay tuned for results as we develop this work a lot further.
I also want to mention, we're also looking at words or workaround how to not transmit the passwords in the first place or how to minimize the exposure of passwords.
So that's a bit of a different tack from the other one.
I mean, one is looking at the compromise.
The other one is much more, let's try not to have that happen in the first place.
And that is a work on password authenticated key exchange or PAKES, and this is discussed again in the blog post by our former intern, Ian McCoy.
Or the basic motivation is that, as you can sort of see in this diagram, well, we want to protect the password in transit.
So it's sent on an encrypted channel, password over TLS, that's great.
You're preventing eavesdropping. Nobody can pick that up.
Once it gets where it's going, it's generally stored as a salted hash. So that's great for making it harder to crack the passwords if the attacker gets the password file in the first place.
But then again, like once the server receives the password, so somewhere between it being sort of securely stored in transmission and then securely stored, you actually have to read and process the password itself while you are doing the checking of the password.
And so then actually it exposes it.
There's a window of exposure while it is being processed as plain text. And when that happens, sometimes one of these systems may inadvertently log it or leak it.
And there have been examples where there were large scale systems that had data breach problems and they were found that they didn't really want to be logging any of this, but it was happening.
So again, how might you deal with not having that exposure that's happening on the server side?
So can we do this? Can we do better?
Well, it looks like, well, yes. So this is why we might look at this password authenticated key exchange or PAKE.
So PAKES have been around in some form, although mostly in papers really since the nineties, but haven't really been widely deployed.
And they have absolutely evolved quite a bit over the decades.
And in 2018, there was OPAKE was published. And this is something I'm highlighting because it has these great security guarantees, but there's also something practical that can be implemented.
And OPAKE is OPRF plus PAKE as Luke mentioned earlier, OPRF or the Oblivious Pseudo Random Function.
So it's a protocol by which two parties compute a function.
So it takes in a key and a value X where one party inputs the value X and another party inputs the key.
And the party that provides X learns the results of the function, but not the key.
And the party that provides the key learns nothing.
Okay, so you ask, how do we use this?
Well, the core of OPAKE is that it's a method to store user secrets. Again, it should make you think about the passwords.
It's for safekeeping user secrets on the server without giving the server access to those secrets.
So again, instead of storing, traditionally we'd have the salted password hash on the server.
Instead, the server stores basically a secret envelope.
So it's locked by two pieces of information.
So it's your password known only by you and then a random secret key.
So that's like a salt known only by the server. So you each sort of have a piece of the puzzle.
So to log in, the client initiates a cryptographic exchange that reveals the envelope to the client, but importantly doesn't actually reveal that to the server.
So the server then sends the envelope to the user who can then retrieve the encrypted keys.
So when they have the keys, the private public key pair for the user and a public key for the server, they can use the keys as inputs to a key exchange, which allows the user and the server to establish a secret key, which can be used to encrypt the future communication.
So, but in brief really, OPAKE allows the client and the server to agree on a shared key if and only if the client knows the right password.
So again, for logging in, all we need is a simple check that both parties have arrived at the same key.
And this way the client can demonstrate this without having to send the password itself at all.
And there is a Cloudflare blog post about OPAKE from last year from a former intern, Tatiana Bradley, who pushed a lot of practical work forward, developed a prototype.
There's a lot of great details in that blog post. And I encourage you to look at the blog post and read more.
I think the title of it says it all, which is the best passwords never leave your device.
So we continue to do a lot of work in this area.
We're excited about extending the use of OPAKE. There's still some hard problems to tackle.
Some of them again are discussed in the blog post, looking at things like, how do we combine credential checking and OPAKE?
So you may be thinking, okay, we're now in a world with no access to the actual password.
I've pointed out why there are advantages, why we actually don't want to have that kind of exposure.
That's great. But then how do you check things like is the password in a breach?
Like how do you see if it passes a policy checker that wants to know the strength of the password because you have less information about it.
So it's not so easy anymore. So we're thinking about ways of how we might combine the strengths of the credential checker and things about OPAKE to try to get the advantages of both.
But this is a really big challenge, but we're excited about the possibilities of strengthening password systems with these innovative developments.
Well, thanks for that discussion on OPAKE. It's really fascinating how, just learning how we can move to a, I guess start moving towards a passwordless world where servers never need to see the user's password at all.
Then you just entirely avoid the problem of data breaches if the server doesn't have the data in the first place.
So I guess we're getting towards the end of this segment.
If you enjoyed this segment, I really encourage you to read the blog posts from today.
They're again, linked in the video description. And there's a lot of other great content out on the Cloudflare blog from authors on the research team, as well as from all across the company.
So yeah, we definitely encourage you to take a look, take a read of all of these posts for some really great content.
And yeah, Tara, thanks so much for joining me on the segment today. And yeah.
Thank you, Luke. It was a pleasure discussing passwords with you. Yeah, and to the audience, thanks for tuning in and we'll see you next time.
Thank you.
Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.
Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.
Thank you. Thank you. Thank you.