Cloudflare TV

DNSSEC and Post-Quantum

Presented by Bas Westerbaan, Roland van Rijswijk-Deij
Originally aired on 

The Domain Name System Security Extensions (DNSSEC) is a feature of the Domain Name System (DNS) that authenticates responses to domain name lookups. Unfortunately, it is threatened by quantum computers. In this segment, we will explore what we need to make it quantum-secure with the experts Roland van Rijswijk-Deij and Bas Westerbaan.

For more, don't miss:

English
Research

Transcript (Beta)

Good morning, good afternoon, or good evening, whatever may be. Welcome to the third segment in our post-quantum series.

Today we're talking about how to make DNSSEC post-quantum.

I'm joined by Roland van Rijswijk-Deij. He's a professor at Universiteit in Twente.

He has been working on DNSSEC since 2008, I believe. A very long time.

And recently, in the last three years, he's been thinking about how to make DNSSEC post-quantum.

He's known for projects like OpenIntel, which does large-scale measurement of the DNS ecosystem, and he has worked on Unbound.

I am Bas Westerbaan.

I'm a research engineer at Cloudflare, and I'm working on making the Internet post-quantum.

Okay, to start off, Roland, could you give a short explanation of what is DNSSEC, and what is at stake?

Sure, thank you, Bas. So, just to be very brief, everybody, I assume, knows what the domain name system is, DNS, which translates human-readable names into machine-readable information like IP addresses.

The problem with DNS is that it was designed in the 1980s, so there were no security provisions in the original protocol.

And for a long time, that wasn't an issue.

But then, a couple of attacks came to light that showed that it was very easy to falsify information in the DNS, an attack called cache poisoning.

And to mitigate that type of attack, DNSSEC was introduced, the DNS security extensions.

And basically, what that means is that all of the information in the DNS is signed using public key cryptography, so that when you ask for a name in the DNS and you get a response, you can verify the authenticity and the integrity of that response by checking the digital signature on the record that you get back.

That really, in a nutshell, is DNSSEC.

Okay, great. And so, what is at stake? What is at stake?

Yeah, that's a good question. So, what is at stake? So, if you think about what we use DNSSEC for, and then if you think about when you should move to post-quantum, then DNSSEC is one of those instances where you only have to change once a quantum computer becomes an issue, right?

It's not signatures that you keep for a very long time and where you have to start worrying now already.

But what is at stake is if DNSSEC were to no longer be safe, because there's a quantum computer that can break the public key cryptography that we use, you would no longer be able to verify the authenticity of DNS records that you get back.

And that means that if somebody wants to, for instance, misdirect you to a different site than the bank you're trying to visit or a different site than the government website you're trying to visit, it becomes easy again for them to conduct attacks on you.

In addition to this, over the past, say, five years, we've started using the DNSSEC for an additional layer of authentication on the Internet.

So, we use something called DANE, DNS-based authentication of named entities, to bind digital certificates to domain names.

And this is increasingly used to secure email communication, which is another of these old Internet protocols that is really hard to secure.

And for that, DNSSEC is playing an increasingly important role.

So, if we were to lose DNSSEC overnight, we would lose all of these protections.

Okay. It might be helpful to speak a bit about what the post -quantum cryptography space is looking like.

So, in DNSSEC, we need signatures, right? And currently, the mostly used, it's still RSA in DNSSEC, right?

Yes. Yes. It's mostly RSA 2048, although you, I think, ECDSA, so elliptic curve cryptography, is a good second to think about.

It's roughly 60%, 70% still RSA, but ECDSA is gaining in popularity. It's around 30%, 40%, depending on which top level domain you look at.

Yeah. This also says something about how quickly the space moves there.

So, RSA signature is 256 bytes, which is not as small as elliptic curves, which are like 64 bytes, but not terribly big compared to what we're going to deal with.

So, NIST, the American National Institute of Technology, is holding a competition to standardize post -quantum secure schemes, key encapsulation mechanism, and signature schemes.

So, the latter we're interested in.

They'll probably announce, maybe next week already, we don't know, but our guess is that they will, well, our guess is that they will standardize dilithium and sphinx, which are two and a half kilobytes per signature, and that's very big, right?

Yeah, that will be a challenge. Yeah. So, what's the primary problem there?

Why is two and a half kilobytes big? Okay. So, the majority of DNS messages are transported over the UDP protocol, and that basically is a connectionless protocol, right?

You fire off one packet as a query, and then you just keep waiting until you get a response back from the server, and that makes DNS very resilient, very efficient, and also has a very low communication overhead.

Now, with the introduction of DNSSEC, of course, digital signatures were introduced into the DNS, and this really inflated DNS messages by a lot already, right?

If you think about the classical DNS exchange in which you ask for a so-called A record, which is the IPv4 address for a name, your typical query would be in the order of maybe 20 to 30 bytes, and your typical response would only be a little bit bigger than that, maybe 40 bytes.

If you add even a single signature to that response, you add, if you add, say, for example, RSA 2048 bits, you add 256 bytes of additional data for the signature and some stuff around that.

So, basically, you might increase the size of that response sevenfold already.

Then you have responses that contain multiple signatures, so you can see it growing and growing and growing.

Still, a lot of time has been spent by the DNS community to ensure that we keep DNSSEC responses relatively small.

The reason we want to do that is that we would like to fit all of the information that we need to transmit in a single packet, because that means we can keep using UDP.

Now, theoretically, you could spread out the response over multiple packets through a process called fragmentation.

The problem with that is that fragmentation is unreliable. There are many middle boxes, firewalls, et cetera, on the Internet that stop fragments from ever arriving at the destination, and that means the DNS community has actually spent maybe even the last five to ten years trying to optimize DNSSEC such that we can fit everything into a single packet.

Do you have data, rough numbers, on how much issues we see with fragmentation?

Yeah, we do actually have data on that.

There are two sides to this, whether or not you can receive fragmented responses and whether fragmentation occurs in day-to-day DNSSEC.

Now, in terms of being able to receive fragmented responses, there are studies that show, and they're maybe from 10 years ago, that show that at that time, 10% of hosts on the Internet or 10% of DNS resolvers would be unable to receive fragmented responses.

That's a huge number because think about it. At the time, IPv6 was becoming popular.

A big company like Google said, well, we're not going to deploy IPv6 if 0.1% of our customers cannot reach us anymore because of that.

Now, we're talking about two orders of magnitude more.

We're talking 10% of resolvers would be unable to receive fragmented responses.

Now, of course, they have all sorts of fallbacks to compensate for this, but it's not an ideal situation.

On the other side, fragmentation used to be very common in the early days of DNSSEC because nobody really tuned their configurations and because of the way the DNS works, you would frequently have responses that exceeded a single packet, so that would lead to fragmentation.

Nowadays, with all of the work that the DNS community has done, fragmentation has become very, very rare.

If you take a representative sample, so for example, you would record traffic on a DNS resolver for a single day, then on average, less than 0 .1% of DNSSEC responses will be fragmented.

It's a very, very small number. Nowadays, I think we've almost eliminated fragmentation.

In 2020, the DNS community had what was called DNS Flag Day in which all of the open source implementers of DNS resolver software decided they were going to change their default parameters such that fragmentation would be a thing of the past.

They configured their DNS resolver not to accept responses that were larger than a single packet.

That almost eliminates fragmentation.

Okay, so fragmentation is a no-go, but then what are the alternatives? DNS doesn't only work on UDP, you can also do it on TCP and we've got the recent new transports, DNS over HTTP, DOH, over QUIC.

Can you tell a bit about that? Yes. Maybe it's good to say a little bit about what is the size of a packet that we could transport to give some perspective.

The configuration that all of the open source implementers adopted during DNS Flag Day was that they would accept packets of datagrams, DNS messages of up to 1,232 bytes.

Now, like you mentioned, the two algorithms that likely are going to be standardized by least have signatures that are already two and a half kilobytes in size.

That's already almost twice the size of what we are willing to fit into a single DNS packet.

Now, for a single signature, and then you might have multiple signatures in a single message.

You can see that that's never going to fly on UDP.

Now, like you said, there are other transport mechanisms like TCP or DNS over TLS, DNS over HTTPS, DNS over QUIC.

Let's start with the simplest one, DNS over TCP, which was actually part of the original DNS standard.

This would be an acceptable means for transporting larger messages, but there are some caveats.

One of those is that despite years of effort and years of trying to convince operators otherwise, there are still people that block TCP for DNS, because there are still, and don't ask me why they do this, security auditors that claim that this is a risk.

It really isn't. DNS should work over both UDP and TCP because there are other reasons why you might want to use TCP for DNS, other than large messages.

TCP would be probably the first step, and it would be relatively simple to use TCP, albeit that the default behavior right now is to always start over UDP.

You first exchange a message over UDP, then you get a response that says, oh, this response is truncated, so you should retry over TCP.

That means that you have an extra round trip in all cases, simply for telling the resolver that is asking the question that, no, you should really be using TCP.

Then they have to do the TCP three-way handshake.

Of course, this is something that there are ways around this where you could limit the impact of this, but what you would really want to do is change the default protocol to be used with TCP in this case, because why have that useless round trip to figure out, oh, I should do it over TCP and do that every time.

Now, one catch with TCP, of course, is that you have to keep state on the authoritative name server that the resolver is talking to.

There is an extra burden on the system that is handling DNS queries. What we see from measurements is that this is manageable.

We can do that. Now, the next obvious step is to use an encrypted transport protocol, because we already do that from client to resolver where we use things like DNS over TLS or DNS over HTTPS.

To start with the latter one, we can throw that out the window. DNS over HTTPS from a resolver through to an authoritative name server doesn't make any sense.

Why would you use HTTPS as a transport protocol there? These are machines talking to machines and not end users talking to machines.

TLS might be an option, but that has a huge performance impact because suddenly I have to do cryptography on this channel, which means that both the sending and the receiving end need to do something extra in order to set up this connection.

The handshake takes more time.

It is computationally more expensive because they need to compute signatures and they need to do encryption and decryption.

Sure, we can optimize all of that.

Yes. Also, there's a lot of DNS servers that sign on the fly anyway, right?

Yes. It's not impossible. We can optimize, but you're still going to be paying a higher price.

If you look at the volumes of DNS queries that you need to handle, think, for example, of the root of the DNS or think, for example, of a top -level domain such as .com, where they're handling in the hundreds of billions of queries on a day.

They already have huge server farms to handle this, but in order to be able to handle encrypted transports, they would have to really beef up their infrastructure, which is a huge investment because all the time, those domains have to remain available.

They are crucial for the operation of the Internet. This is not trivial.

Then if you think about something like DNS over QUIC, that's interesting, right?

DNS over QUIC might be slightly more performant than TLS because you've got things like the 0RTT handshake, which between a client and a resolver, you really don't want to do because it makes you traceable as a client, but between a resolver and an authoritative name server, who cares?

They already know that they're talking to each other.

So there, the privacy risk of 0RTT is not there. But suppose we switch to a reliable transport, be it plain TCP or QUIC or TLS.

Let's just sketch what we're looking at.

How many signatures do typically need to be fetched?

Yeah, I don't have data about that at hand, but the bread and butter of DNS, which is A queries for IPv4 addresses and the quad A queries for IPv6 addresses, you probably have a single signature in that response, right?

So the average number of signatures per response for DNS 6 sign domains will be somewhere between one and two.

That's presuming that you already know the zone, right? That you already have the key signing key.

Yes, that's true. And that's indeed the case, right? In order to be able to validate signatures, you need to fetch the keys as well, which for Sphinx and the lithium is not an issue.

The keys are not so large. This is a problem, right?

They're large, but it's not a problem. And what you need to consider there is that DNS key records.

So DNS records have a so-called time to live, right?

Which is the time that you can keep them in the cache before you need to fetch a fresh record as a result.

So typically DNS key records would have a longer time to live, which means that once you've fetched all of that information and you've validated it, you cache it, and then it stays put for maybe an hour or longer.

So the impact of that is actually quite minimal. The fetching of A and quad A records where there's a single signature, that's okay.

But what we see there is, of course, you also have to validate this signature before you can give a response to the client, right?

So you have to check the validity of that signature.

And this is, of course, also cached. But the problem is that with these A and quad A records, we see increasingly small TTLs, right?

So people want to be able, operators want to be able to change their DNS records almost on the fly.

And TTL is something that gets in the way, right? Because if you have a long TTL, then all resolvers that have already fetched your record will have to wait until that TTL expires before they fetch a fresh copy from you.

And if you want to be able to change this on the fly for load balancing or because you have an outage or you're under attack, that means that there's this window, this TTL window in which clients won't go to your new address because they're still being served the old one.

So what are operators doing? They're decreasing this TTL so that they're more agile.

But this also means that the resolver has to fetch fresh copies of the signatures more often, it has to validate them, and it has to transport these larger messages more often, right?

So this is like you're stacking things on each other that basically mean that the amount of communication that you need to do increases a lot just to achieve this effect of being able to change things quickly.

But now, so you're saying one signature, so that's reasonably manageable still, it will increase the time, right?

If it's not from cache. But if we're looking at, because the statistics are right, public statistics are that approximately 90-80% of all, if you have a big resolver, 90-80% of all responses are cached, right?

Yeah. Approximately. But at least 10-20% of requests that are not cached, and suppose the zone hasn't known before, maybe because it's geographically, maybe you're visiting a zone which is popular in a different country.

How much signatures do we then expect to see?

So for the 10-20% that are not in your cache, you have to do a full recursion.

And in a worst-case scenario, you need to fetch information from the root, you need to fetch information from the top level domain, you need to fetch information from the second level domain, right?

That's the worst case scenario. And you need to validate signatures on all of these.

The root is very likely cached, but... Yeah, that's very likely cached, but you still need to fetch stuff on the top level domain.

So there are multiple signatures that you, in that case, need to fetch and need to validate.

And public keys. And public keys as well. So then we're looking at, say, four signatures?

Three, four signatures, yeah, probably. So we're looking at about roughly 20 kilobytes, right?

Yes. Yeah, but that's not all in one exchange, right?

You're not fetching all of that. No, no, there's several more RTTs for all the TCP connections, and they're...

Exactly, that's the worst. That's actually the problem.

The problem is not that you have to fetch these larger messages, which, once you have the connection set up, they're sent in almost one go.

So that's not a huge issue. But the issue is that you have to potentially set up a new connection every time.

Now, resolvers will try to optimize this, right?

Resolver implementations try to keep TCP connections alive so that they can reuse the connection and don't have to do handshake every time.

But on a busy resolver, you have a limited amount of resources you need to keep state.

So... And that's just websites again, right?

It only helps the popular websites. So we're risking here, with the larger signatures, we're making it smaller.

So this is troublesome to me.

We don't want to make the less popular part of the Internet lower.

Yes, so that's a potential risk, right? In order to be able to support these kinds of increases, it becomes increasingly hard for small shops, small operators to still support this, right?

Because whereas before DNS was something very simple, you set it up and it worked.

Now you already need to do DNS signing, which is difficult for people.

Now you also need to dimension and configure your system to support TCP, which will be used much more frequently.

And then think about deploying something like an encrypted transport that puts an even heavier burden.

And what this favors is, this is all too complicated, I'm going to outsource it to somebody.

Which is fine, maybe this is a business decision that you make, but the risk that we run if we do that, and everybody starts doing that, is that we are losing skills and knowledge that people used to have themselves.

And we are increasingly relying on a smaller and smaller number of operators.

And I think that we should cherish this diversity that we have on the Internet in terms of both implementations and also operators.

There should be a place for the small guy who has a server in his shed, or the small girl who has a server in their utility cabinet, that wants to run their own website and wants to do their own DNS.

Because these are also the future engineers that we need to build the Internet.

Yeah, completely agree. So if there are cryptographers listening now, what kind of signature scheme do we want?

What would be acceptable? Fast and small, everything, that's perfect.

Yes, of course. And I want a pony too. No, so DNSSEC is, in that sense, it's actually a tricky protocol, right?

Because ideally, what you would want is small signatures.

That's the thing that you transport to most of them. A larger key we can live with, right?

It's a shame that both Rainbow and Gems went out of business.

They had really nice, small signatures. What's an acceptable public key?

I mean, I suppose... Acceptable public key size. Well, ideally, it should be...

So DNS has a maximum message size of 64 kilobytes, regardless of which transport protocol you use, right?

So you should be able to fit multiple keys in one 64 kilobyte message, because otherwise you're lost.

But otherwise, you need to take other measures to transport these larger public keys, which we can solve, right?

This is something that we could address if we wanted to.

But what you really want is small signatures such that you would still be able to use UDP, right?

Ideally, the size of an ECDSA signature would be ECDSA P256, 64 bytes, that would be ideal.

However, let's say that you take something like a ski sign, which has these really small signatures.

But then the computational overhead of that algorithm is just ridiculous, right?

You don't want to spend 40, 50 milliseconds validating a single signature, because as a busy resolver, you're validating thousands of signatures per second, which now, like an RSA signature, it's like a blink of an eye and it's validated.

But if you have to wait in computer time, that's forever, like 15 milliseconds, right?

So that's unacceptable.

We can't do that, let alone signing, which is even slower. Yeah, ski sign recently, it was made twice as fast, which is fantastic.

You don't often see such jumps, but we'll need another eight jumps before it's acceptable.

At least, yes.

So it needs to be on par with the worst case scenario that we have in mind now for DNSSEC is ECDSA P384, which is by far the slowest of the standardized algorithms to validate.

If we can get it, that will work with DNSSEC. So if we can get ski sign down to that level, maybe it's an option.

But then signing is still, oh, signing would need to be faster as well, because take, for example, Cloudflare, which does online signing.

I think a signature now takes about two and a half seconds for a single signature.

Yeah, that's not going to work. Okay. So if you would start, okay, maybe this is too much of a question for the last few minutes, but if you start with a clean slate, are there things you would change about DNSSEC?

Yes, if I could. That's a good question.

So let's think about what you want to do.

So in DNSSEC, you have this balance between work for the resolver and work for the authoritative name server.

And the problem that we're now kind of stuck in this race between either we have transport issues, we have validation time issues, we have signing time issues.

What if we could come up with a design where we have to validate far fewer signatures, in which case we get extra budget for validation time.

So there's this solution that you and I discussed, but also Bert Kaliski from Verisign wrote about this, which is to use something like Merkle trees for authentication paths, and then only sign the top of the Merkle tree.

So you group records in the Merkle tree, you only sign the top of that tree.

That's an interesting solution because it reduces the number of signature validations that you need to do.

But it also means that you have an algorithm that you can deploy, which we know is secure as long as the hash algorithm is secure, which we can tweak and tune so that you can still have relatively modest size signatures.

So that's promising. And that requires a complete redesign of DNSSEC, because it doesn't fit that current protocol at all.

I still think it's worth following up on.

Okay. Well, I think the conclusion is that there's still a lot of work to do.

So either we hope on the Merkle signature scheme, or we have a lot of work to do for DNSSEC.

Or we have an uphill battle in front of us.

Yes. Well, to anyone viewing, thank you for listening in. If you're live, in half an hour, there's the next segment where Sophia Selye will interview Professor Tanja Lange on That's good.

Tune in, folks. On the post-quantum, the process so far of the standardization.

Okay.

Thumbnail image for video "Cloudflare Research"

Cloudflare Research
Don't miss these great sessions from the Cloudflare Research team!
Watch more episodes