Hacker Time
Join Evan Johnson as he speaks with security professionals about recent security news!
Transcript (Beta)
Hello, welcome to the number one security show anywhere in the world. This is Hacker Time.
I'm your host Evan Johnson coming at you from Cloudflare's product security team.
And today we have a very exciting show. I started this morning trying to plan out the episode and I was going to do one thing really deep on modular bias after a blog post.
I rediscovered I wrote long ago on the topic, but then I just kind of morphed into this and I want to take some time and show you how I look at applications people build.
So if I was an application security professional, what would I do if I sat down at the keyboard and tried to break a crypto app?
So I'm hoping that the format for this episode, I'll talk a little about each of these topics, about what I look for.
And then it only takes five minutes. So I'm going to load up GitHub.
I'm going to look up some crypto applications and see if we can find any bugs.
So without further ado, I'm going to get started. So when you are an application security professional, or you're doing some app sec or source code review of a service or something, it's common to run into people using cryptography to either encrypt things, hash things, do all sorts of things.
There's all sorts of reasons why people include cryptography in their applications.
And I think it's really important to have a good hit list. And so the first thing I always do when I see something is using cryptography is I start loading up the, if it's open source, I take the software, I look at the crypto primitives it's using.
So this is the primitives they're using for encryption, for hashing, for...
Oh, wow. Is there anything else? Generating randomness, that's the next bullet point.
But starting with, oh, including password hashing. So starting with these, I did an episode a while ago on encrypting things safely with AEADs.
I usually look for the function and the type of cipher or public key crypto primitives that people are using.
So it's common to see AES, it's common to see RSA, and it's common to see things like MD5 and SHA-1 and DES, less so these days.
But it's a good thing to know what is good and what is bad.
I normally call out even things like AES as bad because you normally don't want to be writing code that uses AES.
You want to see strong primitives.
So your building blocks should be strong. AEADs are good. And what's bad, I don't think it's necessarily bad using AES or RSA or any of these primitives directly, but it does mean that you have to know what you're doing a little more.
So that's important.
The first thing I do is look at this and start doing some research. So dig through the code that you're reading and try to do some research about the encrypting and hashing functions you're seeing.
Two is sources of randomness. So this is a really easy one.
I'll show you. Golang math random. Golang crypto random. I'm a Golang programmer at heart, I think.
And this is the Go package for crypto -rand.
This will securely generate you random bytes to use in crypto operations. Under the hood, when you dig in, it's reading randomness from dev urandom.
Oh, wrong button.
This is math random. I have no idea what this is. It's some user space random number generator, but it's not built for security and cryptography.
Things reading from dev urandom are.
And you can kind of see what that looks like by just loading up a terminal.
And you can do head dev urandom, pipe it to xxd. And this is a hex dump of just random data.
There's not a lot to look at, but it can be hard to distinguish from math random.
But it's important to know that when you're generating randomness, usually you want to be receiving a byte buffer back.
So you want to be receiving a bunch of random bytes.
Notice that these are all bytes between 0 and 255, like all bytes are.
But when you call dev urandom, dev urandom just produces bytes.
And math random doesn't.
So you'll see math random produces integers and things like that.
That's usually, if you're seeing randomness functions generate integers, you know that something has gone wrong.
And if you see any random function generating things that aren't bytes, you usually want to ask a few questions about that.
And then if you see randomness functions that aren't reading from dev urandom, you probably want to research that.
It's not so hard to dig into different programming languages, crypto-rand implementations and see what they're doing to see if they're doing the right thing.
So that is the second thing I always do. You'll see people generate randomness to generate passwords, to generate, if you're using a password manager, to generate randomness, to generate keys, symmetric keys.
And so symmetric keys are just byte strings.
And so reading from dev urandom to get a byte string works really well for being the byte string used for a key.
So all that is good.
Then I start to look at crypto implementations. So there's a few things to look at here.
And they're all kind of really in the details. So let's start here.
This is a blog. This is the blog post that started me on this track today talking about this topic.
So I wrote this blog post a long time ago. And it's so long ago that the website doesn't exist anymore.
It's only on archive.org. And it used to render like a QR code looking thing.
It was just a field of bits that were either like white or black.
And you could just kind of see the density of the pixels that were filled in.
And the whole point was you could see modulo bias when using this random function that I made.
And so one thing to look out for, this is really common.
I just mentioned generating passwords. But anytime you're trying to generate a human readable, you'll notice here, not all of these bytes from DebViewRandom.
Since DebViewRandom generates bytes, or you can kind of use each byte as an integer between 0 and 255.
Not all of those things are machine or human readable.
So you'll see a lot of dots here, which means it's not a human readable string on the ASCII table.
But there's a lot of use cases for people being able to read their crypto, their string.
So think about a password manager. A password manager might generate you a password.
And it's going to be really hard to type in the password box, the byte 8a. I'm not sure how I would do that.
I could probably copy and paste it. But if you ever have to type in your password, you're going to struggle to type in 8a if that's a part of your password.
There are these situations where you really need human readable and writable characters.
And so this function is a very good example of how to do things the wrong way.
And I'll walk you through it to describe what modular bias is.
So basically, I generate a byte. I do this all the right way. So I generate some secure bytes here.
And then this is the part that really makes the modular bias and the function significant.
I take away the leftmost bit. So this will be number between 0 and 127, I guess.
Yes. So this will be a number between 0 and 127.
And then I'm trying to generate one of these characters. I am not sure how many characters this is.
Let's see the length. Luckily, we have a script running in the console.
We can see the length here. So desired chars.length.
This is 93. So we have 127 characters. And we only want to use 90.
And we want to generate a string with 93 of them. And so a common way people do this is they'll generate a byte between 0 and 127, or generate some random number between 0 and 255, and then use the modulo operator to get something within the readable, that they can index this.
Like I do here. I take the index of the desired chars string.
And what you end up with, because 93 isn't evenly divisible, 127 is not evenly divisible by 93.
You end up with, after a modulo operation, you end up with one side.
You're going to end up with way more of the first half of these strings.
So A to Z, A to Z. Because you get 93. And then after the modulo, all of those numbers between 94 and 127 will get mapped to a number between 0 and whatever 127 minus 94 is.
So basically, you'll end up with a string that's not always random. And this is a big weakness if people are really depending on the cryptography.
It's a big deal, because it can lead to breakages in your whole crypto system.
So that's a very interesting one.
It's a very common thing to see. And it's really easy to search for.
Just look for the percent sign. So if you see a percent sign, let me zoom in here by a lot.
If you see a percent sign in a function generating a string, a securely random string, that's usually a bad sign.
That's something that you want to look more deeply at.
OK. Non-initialization vector reuse. A lot of these crypto primitives that are secure require you to generate nonces and initialization vectors randomly, also from devurandom.
And there's a lot of bad things you can do. The right thing to be doing is just generating your nonce from devurandom, like your key, and generating a new one for each ciphertext you're encrypting.
And so anytime you see reuse of these things, that's bad.
And anytime you see it being generated not from devurandom, that's also bad.
Correct cipher modes. This is a very famous one that I think a lot of people have seen loading up the Wikipedia page.
But if people are not using an AEAD, this is one of the reasons you really want to use an AEAD because correct cipher modes is just not an issue you have to worry about because they're chosen for you as part of the AEAD.
The AEAD is a whole couple functions together that securely encrypt or decrypt your data.
And so AES has long been the gold standard of crypto and encryption functions.
And it's a block cipher.
But there's a lot of different ways you can actually implement the encryption.
So you can just go block by block and encrypt. You can kind of do some chaining thing.
You can where one block's data from one block gets fed into the next block.
And so as a whole, the security of all the blocks gets held together.
And you can kind of see how it can be a big deal. So this is a very, very famous picture if you're not familiar with it.
They took a picture of a penguin, encrypted it with AES ECB mode.
So all the data, very secure. However, it's leaking information about the data because each block is in...
It's leaking information about the data that was encrypted because each block was encrypted individually.
And so all of these black pixels encrypt to the same thing and all the white pixels encrypt to the same thing because they're using the same key to encrypt each pixel's value.
And that will lead to leaks when you have a lot of the same data.
I think there's also a very famous example of this from the Enigma machines, but it was not completely unrelated to...
That's a type of cryptanalysis and kind of unrelated.
That's a whole tangent. So I'm going to avoid talking about that.
But this is what data should look like. So when you encrypt the penguin, it shouldn't leak any information about it.
It should just look like random noise.
That's what you have over here. And this is good. This is what you want.
But if you... So this is a good depiction that the mode really matters.
And sometimes the mode can be chosen for speed reasons, like the counter mode is common in full-disk encryption kinds of things.
Implementations. There's all types of modes.
And it's a good thing to note that it matters what mode is chosen.
And also you'll see a lot of AESCBC. And then sometimes you'll see nonce reuse and IV reuse with AESCBC because that's a mode that the penguin won't be visible.
It'll look like randomness with AESCBC. But each time you encrypt the penguin, you need to encrypt it with a different IV because otherwise you'll get the same random picture every time.
And you don't want that. So there's all types of things to consider.
Here's what CBC mode... CBC chains each ciphertext together.
And the details matter here. But I look for... Really, because I recommend only using AEADs, I usually just tell people if I say CAES to choose a different library because they probably just want to encrypt something and AEADs will fix that problem.
Okay. The library is very important to look at the actual implementation of the crypto functions.
You don't want... I would never trust myself to implement one of these crypto functions because they're very complicated.
There's a lot of numbers that have been chosen for specific reasons.
And there's a lot to consider. You want to use well-tested, battle-tested libraries that have been reviewed by experts.
And then the last thing...
I didn't really cover hashing too much, but the last thing is whole protocol.
So this kind of gets into it. There's usually a reason you're protecting data.
You're protecting it from somebody. So using encryption functions or hashing data or whatever you're doing, there's usually a reason.
And you're usually a lot of times sending it or receiving it or storing it.
And the protocol matters, like what you're doing with the data, who sees what, how you send and receive.
And so a common thing would be considering replay attacks. So what happens if attacker intercepts the data?
Can they replay it later? And this is something that is really difficult to do.
And analyzing security protocols is really difficult.
And you usually need a lot of time and space and freedom to research these.
It's not easy. And you really need an expert to look at it because all of the concerns are very specific to what's happening and your specific implementation.
But there's a very famous protocol that is used in the AppSignal on so many people's phones.
And that is the Signal Protocol.
It started out as something called Axolotl, I think, and then became the TextSecure protocol and is now Signal.
And there's a whole mailing list of people who all they do is live and breathe these security protocols, messaging protocols.
And so if you need to roll out something advanced, look for prior art, it's very bad to make your own protocol because you will do it wrong because everybody's done it wrong.
And the entire field is built on people who have slowly figured that out.
And I'm sure people find weaknesses in future and our current protocols as well.
So this is usually what I look for in five minutes. And the last thing I'll say is signs you know something is horribly wrong.
So one, I think it's good to add to this, the percent sign.
So if you see modulo when generating a string, that's bad.
When you see the letters AES, RSA, recommend an AEAD. Instead, anytime you see ECB mode, anytime, I didn't talk about this, but anytime you see a ciphertext without a integrity hash associated with it, that's bad.
Because somebody can mess with that ciphertext and you'll never know because it doesn't have integrity hash associated with it.
That's solved by AEADs, but you should keep that in mind. But then the other one is, it's probably a smell anytime you see a one-time pad or XOR that looks like this.
In most programming languages, the exponentiation, but it's not.
It's bitwise exclusive OR. And anytime you see claims like military grade and bank grade, and then the last I'll say is, well, military grade and bank grade don't mean anything.
It's kind of like space age. Space age was in the 60s and the 60s weren't that advanced, but we did go to space.
And then any cloudy cryptography.
So what I mean by this is crypto that's all behind the scenes. I don't want to call out AWS here, but one good example of this is AWS S3.
And I think it's really hard to summarize what benefits you're getting from encryption and S3 beyond a compliance obligation.
So that's something to keep in mind. You should encrypt your buckets and everything.
It's just a checkbox to turn on and do the right thing.
But it's really hard to wrap your head around why that's necessary and what's being encrypted at what layer.
So is it the actual hard disk that the data is being stored on?
Is it the hard disk is encrypted and then at the application layer?
So it's confusing. Okay. We have six minutes left. That's my rundown on everything I look for if I'm doing a five minute audit.
Need a sip of coffee.
And so what I thought would be awesome is to go on GitHub, search for some crypto apps and look at them.
And so I think I have some. I haven't spent much.
Okay. This is a pretty good one. This is blockstackstacks .js.
This is something cryptocurrency related and they've implemented a bunch of stuff.
I found this just a few minutes ago and I just wanted to walk through some of their implementations and talk about it.
So let's start with pbkdf2 because this isn't something we covered.
This is password base key derivation function two is pbkdf2.
I think that's what it means. And this is for when you have a password and you want to encrypt it.
You want to convert a human writable password into a key.
pbkdf2 is a good way to do that. There are other ways. scrypt is one, but this is this is a common thing you'll see.
And it uses an underlying like hashing function.
It's generally pretty slow. And yeah, this is kind of hard for me to read because there's kind of a lot of code here, but nothing stands out as obviously wrong.
There's usually a salt. Salt should be random each time per user per password based key.
And this is probably a hard one to look at live on there. But the pbkdf2 is a common thing you'll see.
It's kind of like AES where you can do things improperly with it, but it's not necessarily bad.
But it is a primitive that you should be aware of.
Let's look at HMAC SHA-256.
Whoa. So create HMAC.
What's going on here? They call create HMAC with SHA-256. They give it a bunch of data and get a digest out of it.
So where does this come from? Digest this.
Digest. Here we go. So they're using subtle crypto or subtle crypto. Web crypto HMAC SHA-256 implements HMAC.
And then, huh. I'm not a very strong TypeScript developer.
That's for sure. I'm having trouble following these.
Maybe it's showing my age. So this is all like very object-oriented code.
And so things are getting, there's an implements. So there's an interface being implemented here.
You'd really have to go look at what's going on in wherever the actual HMAC digest is being generated.
Let's go to it here maybe.
This is the file I'm in.
Yeah. I hope there's, that this isn't all of them. I really struggle reading object-oriented code a lot of times because that's one of the reasons I like Golang.
Maybe this is a bad one to spend time with. Let's try a different one.
But this does seem to have a lot of the things that I would expect. Yeah.
Get random bytes. Usually it's a, usually you can suspect that things are going to be done properly when you see that you're getting random bytes, but you still want to verify it.
And also random bytes is here. So that, that seems, that seems okay.
What's keys? Get entry key from random bytes. Yeah, this is a little confusing.
So we have a minute 18. I will show you one that I wrote, some crypto code that I've written, and I would challenge anybody to go find some problems with it.
I would really appreciate that actually. And it's here in the PC file for Pasco crypto.
And I actually had to go through this whole modulo thing myself.
And, and you can, you can take a look at how I got around that, but it is kind of difficult to do things the right way.
And I'm not certain that I did.
Generating, generating code for, or generating strings from randomness is a pretty difficult thing when you sit down to do it, but please go here, check out Pasco on my GitHub and the generate password function.
But then all of the, all of the functions, whenever I generate a hex string, I'm using crypto Rand and I have all of the, AEAD is here that I'm using natural box, secret box are great implementations, but that's all the time we have.
I really appreciate you watching and I'll talk to you next week.
Bye.