Inside Cloudflare's Gen 13 Servers: Trading Cache for Cores

Presented by: Victor Hwang, JQ Lau

Originally aired on May 1 @ 9:00 AM - 9:30 AM EDT

In this episode of This Week in NET, JQ Lau and Victor Hwang from our Network & Infrastructure Strategy team walk us through Cloudflare's 13th generation of servers — the machines that power a significant part of the internet across 330+ cities worldwide.

The Gen 13 program doubled compute density by jumping from 96 to 192 cores, but that came with an 83% drop in L3 cache. The team explains how a bold hardware bet, combined with Cloudflare's FL2 Rust-based software rewrite, turned that trade-off into a win across throughput, latency, and power efficiency.

From counterintuitive fan physics to credit card pen tests on chassis intrusion switches, this conversation covers the full stack: CPUs, memory, storage, networking, security, and what's next — including post-quantum readiness at the hardware layer.

Mentioned blog posts:

English

Transcript (Beta)

That one is really non-intuitive. When you think about it, adding one more fan should technically push the fan power up. But it turns out, because fan curve is non -linear, technically it's super linear. The more, the higher the fan speed is, the more power it consumes. And it's exponentially more. So in order to use, if we use four fans to cool the CPU, it would have pulled 50 watts, 100 watts. Versus if we use five fans, it turns out to be just 30 watts. It's counterintuitive because we only need to spin the fan at 10% to 20% versus at 40% to 50% of the cycle. And that one was really interesting. Hello everyone and welcome to This Week in NET. It's May the 1st, 2026 edition. And of course, many places are celebrating a holiday, which is the case for me. That's why I'm recording on a Thursday. And this week we're going deep into something that looks like a hardware story, and it is in a way, but really isn't just that. We're talking about Cloudflare's Gen 13 servers and why this wasn't simply a new generation of machines. It only really worked because the request handling layer underneath was rewritten. Before that and that conversation, let's do a check on the latest in the Cloudflare blog. And there's quite a bit. From post-quantum encryption, now going generally available for IPsec. So the post-quantum encryption is really more important than ever. We had an episode only about that two weeks ago. You can check that out. This is another layer that was added. There's also something quite new this week. Agents can now actually create Cloudflare accounts, buy domains and deploy. A cool blog post that you can read. We also look in the Cloudflare blog this week to the Q1 2026 Internet disruption report. That means shutdowns. For example, the Iran one that is now over two months of complete shutdown in the country. Just a few whitelisted equipments can access the Internet. There's power outages, even attacks on infrastructure. There's a lot to dig there. Plus, making Rust workers more reliable. That's also a blog post. And a rethink of bots versus humans on the web. A very cool blog from Thibault, from our research team. There's also, of course, last week, a full recap of everything launched during Agents Week. And there's also a This Week In Ed episode about that. And now, without further ado, here's my conversation with GK Lau and Victor Huang from Cloudflare's network and infrastructure strategy team. And as usual, I'm your host, João Tomé, based in Lisbon, Portugal. Hello, GK. Hello, Victor. How are you? Doing good, doing good. How are you? I'm good. And for those who don't know, where are you based? I'm JQ. I'm based in Austin, Texas. Victor? Victor, I'm based in the San Francisco Bay Area. And can you give us a run-through? I always like to start here of your job at Cloudflare. And when did you start? Victor, want to start? Sure. I started in June of 2024. So it's been almost two years for me here at Cloudflare. And your role? I am a hardware system engineer in the hardware team. Yeah, I've been at Cloudflare since November 2021. Same thing, hardware system manager has been helping Cloudflare to launch the Gen 11, the Gen 12, and now the Gen 13. One of the things this area is also important is servers are quite important to make the cloud. And Cloudflare is really well known by its network, its global network that is always expanding in a sense. Can you give us a run-through on the importance of the actual servers? And 11, Gen 12, Gen 13? Sure, yeah, Gen 13. So Cloudflare has about 330 POPs worldwide or more. And all these serve customer requests, right? So in order to serve these customer requests, the software stack needs to run on some hardware. Instead of renting from the likes of AWS GCP, Cloudflare as a security company wants to own its own hardware, make sure we have control and make sure that we serve our customer requests securely. And that's when the hardware team comes in to design our server specifically for Cloudflare workload. Our Gen 11 servers was based on AMD Milan CPUs, and they were the best of performance for Cloudflare workload at that time. And as time goes on, we have Gen 12 as the CPU refresh happened. Gen 12 is based on AMD Gen 1X. And now in 2026, we have launched Gen 13 to better serve Cloudflare workload with more efficiency. One of the things that some of the blog posts we wrote, two blog posts, mention is how this processor doubled the computing power, but also cut a key resource by 83%. These numbers are relevant numbers, especially now that the AI is around and people really value compute more than ever, I would say. What are the main efficiencies from, and also computing power from Gen 13 that we could highlight? When we looked at Gen 13, we have a couple of options available in the market. Out of the CPU options available to us, the one that offers highest core density come with significant smaller L3 cache compared to what we have in Gen 12. For everyone, L3 cache is sort of, you can think of it as a fast memory that the processor has access to. For context, we're looking at AMD Torrent CPU for our Gen 13 servers, as well as the Intel variants, Intel M.R. Rapids. The one that offers the highest core increase, going from 96 core to 192 core, with significantly smaller L3 cache from 12 megabyte to 2 megabyte. Other options are 4 megabyte, but with lower core count. What happened is that our software, the request handling layer software, FL1 specifically, that's our major workload, is sensitive to the size of this fast memory, the L3 cache. Because it's a 83% drop, it's a one-sixth of it, so it has significant impact. But it turns out to be, as we design the hardware and work with performance and software team, it's not just the hardware. It also is about the software architecture. And today we'll talk a little bit more about how the hardware and software co-design have an impact overall on the Gen 13. One of the things that I think a general audience will value is this perspective of, of course, we are present, we have the data centers in over 330 cities around the world, but in what way the CPUs are an important part of that? In what way the CPUs are relevant and these new generations allow to have other capabilities that were not around before? What can we explain there? So the new CPU actually, it gives us a much more throughput and is 50% more performance compared to our last generation in Gen 12. And that serves our workload worldwide. So that is the main, the heart of the server that processes all the requests that are coming in through our networks. Of course, and if that's more, it has more computing power and is more efficient, it means that we can have more requests. So we can have more, not only more customers, but we know that the Internet, usually it's growing in terms of usage. It allows us to have that capability of more usage by user, by companies, but also with AI, there's more bot traffic, more automated traffic around. Those capabilities are quite important in this situation, right? Exactly, as technology advances, the CPU vendors will keep pace with it. So when we refresh the hardware, we also make sure that our hardware roadmap follows the industry and is capable of doing the latest and greatest things. In terms of whether it's raw CPU power to process more requests or just technology update in terms of encryption algorithms, different type of encryption, making sure that all the buses are secure, making sure that we have the updated memory speed, upgraded memory speed to support the larger amount of processing power that the CPU has and things like that. One thing that is mentioned in the blog posts, the two blog posts that came out, that I suggest anyone to see, there's diagrams, there's a lot of details there for them to want to understand, is this part of, in terms of trade-offs between more cores and less cache, and what does that actually mean? And why is that a hard question to know or decision in this situation? What can we explain for those even that are not hardware engineers about this trade-off, about more cores, less cache perspective? Sure, I can talk a little bit about that. That's a very good question. Just I mentioned a little bit about how Gen 13 versus Gen 12 CPU options has smaller cache or faster memory. Think of it as like a scratch pad. If you have a large L3 cache, you have a large scratch pad. If you have a small L3 cache, you have a small scratch pad. If you're doing work and thinking like CPU as a worker doing work, if it needs to go retrieve information, if it has a big scratch pad of like a lot of notes written on it, it's very fast for him to go find information and come back to continue to work. Versus if you think of it like a memory, the DRAM modules on the server, think of it as like a bookshelf with books. You have to go find a book, find information, come back and do the work. So that's sort of like the analogy that we can think about. So for CPU, they are operating at a much faster pace than a human does. So cache, like a scratch pad access, is 50 nanoseconds. So that's a billion of a second. Memory access is seven times of that. It's 350 nanoseconds. So every time if the worker, the workload needs to go fetch information, if it can fetch from cache, it's much faster. If it needs to go out to memory, it's a seven times delay for each memory retrieval. And each workload can have thousands, millions of memory accesses. If any one of them is delayed, it impacts overall experience, right? So when we compare our, when we're designing Gen13 and comparing workloads, when we look at FL1, our core request handling layer, prior to 2025, is that it's based, it's written based on Nginx and Logit. It has a heavy reliance on a big scratch pad. When we put Gen13 CPUs, or when we put the FL1 workload on Gen13 CPUs, what we notice is it has significantly higher cache miss. And what it means to user is a higher latency. When you load a web page, it may take slower. When you go to any links through Cloudflare, it may take slightly longer time. Through our research, Victor can share more when we look at three different AMD CPUs. What do we see from comparing core counts and latency? So the safer path, of course, would be to go with 128 core Turin CPU, because that one actually has a higher cache for the CPU. It actually provides four megabytes of cache per core, compared to the 192 core only provides us with two megabytes. But overall, for us, when we were evaluating our workload, when we picked the 192 core for the Gen13, the decision came from the fact that our FL2, which is a new Rust -based rewrite of the software stack that JQ mentioned earlier, the FL1, it was already in progress. So as FL2 matured, we tested it on the Gen13 hardware, and we found out that the latency penalty dropped dramatically. And it improved very, very well. So this gave the team the confidence we need that we are less reliant on the L3 cache compared to the FL1 workload. One of the things that one of the blog posts we wrote mentions that I think is quite interesting is the Gen13 business impact. For example, up to two times throughput versus Gen12, 50% better performance in terms of watt, and also 60% higher rack throughout throughput versus Gen12. Those will be impactful for businesses that depend on Cloudflare, of course, for the Cloudflare business, and for general users as well, right? Because we mentioned latency and how users are impacted by that because things are just faster, and there's more compute power as well. There's a balance between latency, compute power, what we want specifically, right? That balance is quite important, right? Yeah, the balance is very important because designing a hardware or designing a lot of things in general is all about trade-off. There are typically multiple axes of things that you can improve on. So for a server case or for server hardware case, it would be, can this new hardware serve a lot more requests? Can I increase requests a lot more? Typically, that means that you need to burn a lot more power, or you may have to compromise on latency that tolerate a little bit of latency to serve more requests. You typically don't get all three or all your axes perfect. You may have to compromise on some or evaluate the trade-off. For our case, fortunately, with software, with hardware, we get more requests and more power, a little bit more power to serve more requests. So we improve on both, so efficiency and throughput. But with FL1, we compromise on latency. Fortunately, we have an FL2 rewrite that is already undergoing for efficiency reason, for modularity reason, as well as for software team to continue to offer new product and new services to customer. That rewrite helped improve on the axes that hardware improvement cannot help. So latency, it helped drop the latency. So combining both the hardware and software co-design actually helped us to achieve improvement in all three axes. So thinking about the trade-off and working with a cross-functional team to help provide support in areas that just your own organization or your own design could not help improve, maybe two different workstream co-working together could potentially cover the shortcoming of other workstream. Yeah, and you get a win -win. We've been mentioning FL2 and FL1 in terms of software. And that's, as you mentioned, it's a version of software that handles requests, of course, in these data centers, in these servers. And this new version, correct me if I'm wrong, it's rebuilt to be faster, safer, and also less dependent on the huge CPU cache, right? Am I correct? Yes, that's correct. Yeah, I think with as Cloudflare scale, as Cloudflare evolved as a company, FL1 served us well for 10 years on Cloudflare from a startup all the way through a company that does about a billion dollars of revenue on an annual basis. As Cloudflare look for the next 10 years, since we need to continue to scale for the best-in-class services for our customer and be able to serve our customer efficiently, the software team has a put -together effort to re-architect our request handling layer to be able to scale even further, to be able to serve our customer securely. And that comes with RAS rewrite, switching the language from Logit and NGX to a RAS-based server. That itself helped improve sort of the security. It's a memory type safe language, but not just the language change. That is everything, right? It also means re-architecting, thinking about how we design the software pieces to be able to scale efficiently and serve our customer. And because that thorough thought process to the legends, it made the software more efficient after a rewrite and be able to work and utilize hardware resources more efficiently. One thing that I find always interesting in this part is the marriage between hardware and software on servers and the importance of them both being in a sense, one, how the roadmaps both for hardware and software evolved in different paces in terms of how our teams work to try to make the best of those two. Put together in a sense. Sure, of course. So going forward, the lesson is that we know that we cannot design hardware or we cannot choose have a hardware selection that is in isolation of the software roadmaps and vice versa. The same thing for software when they develop it, they have to keep in mind how the hardware is changing. So when the team evaluates a new hardware against software, they have to evaluate both against the current software stack and anything that is planned on a future releases. So that there's no the collaboration between the two teams can surface any constraints or any issues that we'll have to solve together. One of the things also important in this type of thing and it's mentioned in the blogs, it's the components. The CPU gets most of the attention, but the team had to redesign almost every other component there. That means memory, storage, networking power. That means a lot of things. In what way that redesign was important, but also we learn lessons from it. Sure. Memory wise, we actually doubled in our total capacity. We went from 384 gigabytes to 768 with 12 memory channels populated for maximum bandwidth. We actually were able to increase at its peak a 33% increase over our previous generation. While we were still able to maintain a four gigabytes per core ratio. And since FL2 uses memory a lot more efficiently, we were doubling the capacity, provided a lot of headroom for growth for future workload growth. Storage wise, we expanded the internal storage from 16 terabytes to 24 terabytes by adding a third drive upgraded from a PCIe Gen 4 to PCIe Gen 5, which provided lower latency and better bandwidth. With additional storage, we provided support growth in many of our workloads, including like the CDN cache, durable objects, containers, et cetera. And then- And a note, those have been growing a lot with AI, durable objects, even Agents Week. We had Agents Week now. People are not only internally in products that are from Cloudflare using that, but also externally, which makes sense. That's correct. And we also added the capability of a front drive bay. So the Gen 13 chassis actually supports up to 10 U.2 PCIe Gen 5 NVMe drives in the front. That means is that we can use the same chassis to support both compute workload as well as storage workloads. Even if we need to in the future, if we need to do a field upgrade from a compute node, we can just use the same chassis by populating the drives in the front and it can very easily expand its storage and become a storage server. Yeah. On the networking side, as you could imagine, as the CPU now processes more requests, it does mean that there's more input, more network traffic coming into the server. At the same time, as the requests get processed, there's more network traffic going out of the server. So we have looked at our... On Gen 12, we have a 2x25 gig networking cart on the server. When we look at production metrics, we're seeing even on FL1, it's already running at 50% utilization at P95. P95 is 95 percentile. And so as we imagine, if FL2 become more efficient, that utilization will go up, which should be good on Gen 12. But as we think about Gen 13, Gen 13 is going to be up to two times of Gen 12 performance. So what that means is the 25 gig port will be saturated if we continue to stick on 25 gig. So I've looked at the industry and look at is 50 gig the size we upgrade to? Or do we need to upgrade to even higher for future compatibility so that we don't have to re -evaluate every year or every generation? When we survey the market, 50 gig is actually not a common industry standard. A lot of people, a lot of companies in the world or today's shipping today jump straight to 100 gig ports. So we have looked at 2x100 gig option, 2x200 gig option and beyond. This is coming back to cost versus throughput trade-off. You can get something much bigger, but it comes with significant cost. The sweet spot turns out to be 2x100 gig. Because we have changed the port on the NAT, the upstream networking gears, the TORs, the routers have to be upgraded as well. So we have worked with network hardware team as well as our network team in Cloudflare to make sure the entire stack is upgraded to support Gen 13 as well as the new generations going forward. In addition to networking... Sorry, and that's networking, right? That's the way the server does networking. Correct. That is networking. Any requests coming in, going out that traverse the network coming to the server, that's networking. In addition to sort of networking, so we also added additional PCIe card support. In Gen 12, we have support for one PCIe card. So that's what we use to enable installing NVIDIA GPUs to support our worker AI project. In Gen 13, we expanded that support to two PCIe cards. So now you can support two high-powered NVIDIA GPUs in order to serve larger models with lower latency. Not just for GPU for worker AI, it also enables flexibility for supporting future accelerators that we don't know of today or for us to consider other options, say time card, say DPUs or smart NICs to accelerate our software stack if we were to encounter some of those products or look into them further in the future. I'm showing here in the image image from the blog, which is the Gen 13 server specifically and storage memory, CPU memory. There's a lot of the things that we've been talking, it's all here. Any guidance in terms of the difference visually that from Gen 12 to this one, any changes that we can spot here? Yeah, I think the big one would be the one that Victor mentioned on the front drive bay, right? Victor, do you want to talk a little bit more about that? Yeah, the thing I just mentioned is from the picture, you can tell the difference in the chassis. We had the 10 front drive base that supports storage and for storage workloads as well as that helps us reduce the number of SKUs that we have to serve across the global supply chain, right? We no longer have to buy a storage server and then another one for compute server. Now it's merged into one. You can just buy the same chassis, populate what you need for your workload and then deploy it that way. The other thing you can probably see in the picture is we added an extra fan to help cool the 500 watt CPU. I think in Gen 12, it was just four fans across with increased thermal design power on the CPU. We added an extra fan in there to increase the power efficiency that's for the server. Yeah, that one is really non -intuitive. When you think about it, adding one more fan should technically push the fan power up. But it turns out because fan curve is non-linear, it's super linear. The higher the fan speed is, the more power it consumes and it's exponentially more. So in order to use, if we use four fans to cool the CPU, it would have pulled 50 watts, 100 watts. Versus if you use five fans, it turns out to be just 30 watts. It's counterintuitive because we only need to spin the fan at 10% to 20% versus at 40% to 50% of the cycle. It's definitely interesting. It's definitely interesting. This is the blog we were mentioning. There's the other one also, launching Cloudflare's Gen 13 servers, trading cash for cores. One question that I think also relevant in this type of hardware issues is security. Security is often invisible to end users, but Cloudflare is protecting a huge chunk of the Internet. So what's new in Gen 13 security story and how does the physical design of the server play into that? We talked about that, right? Yeah, that's a very good question. That's, like you said, Joao, that's something that we take very seriously. Security is something we take very seriously at Cloudflare. So when we design the server, we also look at how we improve on security posture at the hardware component level. AMD CPU, the Gen 13 options, the Turin CPUs offer more protection. So in the past, since our Gen 10 platform is based on AMD ROM, that was introduced in 2020. We already have memory encryption. Fleetwide today, Cloudflare has memory encryption. In Gen 13, AMD added PCI encryption to their CPU. For those that are not experts, memory encryption means that the memory that we have, it's fully encrypted. That means more security, of course. Correct. Yeah, data in transit. So think of it of data in transit and data at rest. Data at rest are the data that stores in the SSD. So things that you don't access actively as you work on workload, but you do want to store it securely. So those are data at rest with encryption, SSD encryption or storage encryption. Data at rest has many components. Sorry, data in transit has many components. Things, data that transit through the network, those are encrypted by network technology. When it lands on a server going from the NIC through the CPU, those are still technically encrypted because those packets just pass through. Once it's getting processed on CPU, CPU would need to look at actual data. So decrypt it, work on it. As it works on it, it may need to move some data from its cache to memory store temporarily and then later fetch it again to process. Portion is taken care by memory encryption and make sure that data is secure. Don't want no threat actor can intercept in the middle and go figure out what is going between the CPU and memory. What is happening in Gen 13 and that parameter extended. Think of the days when we have GPUs, DPUs. The traffic going through them from the CPU to the PCIe bus, to the GPU in the past is not encrypted. They are just plain data unless the CPU, unless the kernel have enabled support to enable to encrypt every packet. But in the past, there's no hardware support. Today, AMD has enabled it. Every data that is going on to the PCIe bus, whether it's NVMe, whether it's the GPUs, whether it's DPU, regardless if other hardware support it, coming out of the Torrent CPU, it's encrypted. So now the entire internal bus systems are encrypted. All data flowing within the server is encrypted. And there's intrusion detection as well, right? Yeah. What is that? What does that mean? That's a very good question because Cloudflare has POPs worldwide. Some in very remote location. These POPs may have different security posture. Some has very high standard. Some has a security that is good enough. But inherently, that's just the data center security posture. You have threat actors, especially nascent state, that can break into any high security facility. So we cannot trust even the workers, the tech that work at that location. We want to know when any of a server get open up. If it is a scheduled maintenance, we can compare it to a maintenance schedule and know this server is supposed to be open up when we get the signal. But if it is not a scheduled maintenance, we want to get that signal. So that is why we introduced chassis intrusion in our Gen12 platform. And in Gen13, we improved the posture further. In Gen12, when we did pen test, pen test is a penetration testing where a sort of like a white glove hacker sort of hacked the system in order to identify what can be improved. One of the things that was identified is that intrusion switch is too close to a flat edge of the server where someone can slide a credit card in to keep the intrusion switch pressed down and then just open a server without triggering the intrusion switch. So we now move it to a area where the cover is curved, so there's no way for this slider credit card to block it. And we have it on both sides, so you cannot just open one side versus the other and things like that. So those are discovered as we do penetration testing, as we look to the mechanical design to identify how we can secure the server further. That's definitely interesting. One, we already mentioned before, actually, that one of the blogs mentions that there's 50% better performance per watt. How important it is for a company running infrastructure at Cloudflare scale of that number? What is the result of that in terms of improvements? Victor, do you want to take that? You're talking about in the software perspective? Exactly, in the sense of how important it is for a company of running infrastructure at Cloudflare scale of having that savings, in a sense. Yeah, I guess I can briefly talk about it. So when we design hardware, the reason Cloudflare want to bring on hardware design, not only because we can control how the hardware design is, but also to make sure we have better control on the cost structure, have more visibility into where we spend our money. And as we think about cost structure for a server, there's two components. There is a capital expenditure where the cost of the server itself, and there is the operating expenditure is operating the server. This include the data center space and power. And when we talk about power, if the server consume, say, 300 watts to serve 3 ,000 requests, if you can do 3,000 requests in half the power, say 150 watt in a new generation, that would save Cloudflare money long-term. So that's sort of the area where we think about, so when we say 50% per per watt improvement, what that translate to in the real world is that Cloudflare would spend less money operating the servers to serve the same amount of requests. So if Cloudflare continues to scale up, we need to serve more and more requests, but we make sure that the operating expenditure don't go up linearly with the number of requests we serve, can sub-linearly increase. And with AI, that is relevant because people are definitely doing much more with those tools in terms of requests, in terms of websites. So that's really interesting. Zooming all the way out, what's the single biggest lesson from the Gen 13 program? And what would you want someone outside Cloudflare to take away from this story specifically? I suppose the biggest takeaway from designing this generation of hardware is that don't evaluate the hardware design in isolation from the software roadmap. Because to unlock the best performance is usually where the two intersects. And that's when the two teams are collaborating, that's when you get the best results. Makes sense. Yeah, neither the hardware or the software alone would have produced a result. It's a collaboration in order to get to a win-win situation where you can win on all axes that you're trying to improve. It typically takes more than one team or one area of focus to work on. And in this case, the results are quite important, right? The way that we double compute density in terms of hardware, reduce cache, and also the software element. You can also see that without the software part, the hardware wouldn't be as performative and relevant here in this situation, which is interesting. Exactly. It's not just metrics that we're tracking that improve. This has, like you just mentioned, it's now real-world imprecation where Cloudflare now needs to spend less money to do more. And so that's the big part for Cloudflare is that as the new hardware generation comes out, efficiency improves. We can serve customers more effectively at a much lower cost basis compared to the previous generation. One of the things that I also find interesting is while building this, we have remote teams in many parts of the world, but we also have a lab in Austin, right? How does that lab was relevant in this case? Yeah, we have a lab in Austin where we put evaluation servers into it. So when we first got some servers from our vendors, we can take many samples and put it into Austin lab and bring up for initial evaluation and run some initial benchmark. So lab in that case is very useful for us because we don't have to put a untested server immediately into production. We get a sort of a staging environment where we can do initial testing, initial benchmarking, as well as just look at things that can be improved before we put it into production for final confirmation of the performance and to make the final decision. Anything you want to add there, Victor? Yeah, so the lab is super useful in terms of when we want to test the new hardware, we want to try things out. We don't even know if that is something that we want. It is a safe space for us to play around with the hardware. And it allows us to brainstorm and come up with designs that wasn't probably originally we even thought of. It's something that ideas when you are able to play with hardware is in the lab. Interesting, interesting. I need to go there one of these days. Now I'm curious, more curious. One thing before we go, I think maybe it's relevant in terms of the planning of this, of design of this, the execution of this, how relevant it was in terms of leveraging AI and LLMs, people who are using more tools to build stuff. So they're building more. In what way that was important? And do we have like numbers in terms of growth that we could share potentially? For AI work, when we designed this hardware, we did use AI a little bit to help because we ran a lot of experiments to gather the result of those experiments to know what knobs we can turn to improve performance. One thing that AI has been effective because this is 2025, this is considered the baby age of AI. At that time, it already helped improve analyzing data, trying to summarize the data effectively for us to know where to focus our attention on. So it is definitely a speed up of say, I think 30% to 50% of the time. Today, with significantly more capability, better models available to us, it can definitely do a lot more. In some cases, it's a speed up to reduce their work by 80%, right? You can do a lot more today because it has all the memory of what you have told it to do, learn along the way. And then because it now can see a lot, it can consume a lot more data and be able to make sense of it properly and have a lot more accesses to different tools within Cloudflare to be able to understand what the data represent. It is definitely a big time -saving. Victor, do you have anything to share in terms of using AI for hardware evaluation? Yeah, and AI definitely, like you mentioned, allows us to start building tools that we used to have to run very, very manually. You have to run one step after the other. Now with AI, we're able to develop tools for a lot of automations, a lot less manual work. It definitely speeds things up and help us look through in a lot more details than we were able to before. Makes sense, makes sense. And even there's a blog post that is not related to hardware in particular, but Cloudflare's network recently passed a major milestone. We crossed 500 terabits per second of external capacity. And I guess that the servers there are also playing a role in these metrics because things there have been increasing a lot as well. And even DDoS attacks also being really big these days. Yeah, I think as the Internet grow and its capability grow, you have a more powerful server is sort of a double-edged sword, right? You can process more, but at the same time, the attacker can also do a lot more using it. DDoS attack can be amplified much more effectively as well. But fortunately, Cloudflare always planned with that in mind. Amazing that we have achieved 500 terabits. I'm pretty sure the next 500 terabit will be much shorter time compared to where we started to get to 500 terabit today. With much more capable server, we definitely always plan in mind to make sure we have enough capacity to serve customer growth globally, as well as to absorb any DDoS attack that is targeting us. We are very confident that we have enough throughput and capacity worldwide to be able to withstand anything and be able to serve our customer effectively as we grow. Thinking maybe just ending on the projecting the future, what in this area, may that be hardware or software is coming? What are the next steps even for your teams here? What can we say? Yeah, in the hardware front, there's definitely a lot of growth. Even recently, as we looked at Gen 11, Gen 12, Gen 13, Gen 11, we only have two CPU options, one from Intel, one from AMD. As we move to Gen 12, there's three options from AMD and then two to three options from Intel. And then there's ARM variants coming up. Similarly on Gen 13, it's proliferation of options because everyone, there is so many points of improvement that can be made. So companies or vendors are coming up with ways to improve on all the axes. So as we move to Gen 14, there will be even more options for us to evaluate and pick the one that's bad for Cloudflare. So for us, it's a very exciting time for us to also know what changes in CPU actually benefit us the most so that we can sort of spec out what a ideal Cloudflare hardware would look like. Software front of things, very exciting changes as well as Cloudflare think about growing to the next level. A lot of architecture thought of how do we scale further? And there is discussion about rewriting stuff that is bottlenecking us and things like that. So with AI world as well, Cloudflare is doing a lot of work to make sure that we support the growth in agents and MCP and make sure we enable, build the tools to allow developers to build on Cloudflare and make Internet better as a overall. Yeah, and just to add another thing what Jake, you mentioned on the hardware side, we're no longer just thinking about a server level design. We're actually starting to look at the rack scale design. We want to deploy in racks. And so the selection logic here is going to be driven by the performance per watt, the throughputs per rack and supply chain reliability, etc. So that allows us to scale a lot bigger for the future growth. Yeah, software modularity will play into that and make sure that when we think about rack scale, the software can be selected to run a specific rack, specific hardware types and make sure that we can easily move that around and scale effectively. There has been some discussion in terms of, first, you mentioned something that is relevant, which is there's apparently more interest in terms of operators that we can count on for hardware, which is good. And we can see that the compute is really important these days. So there's more players around that's important. But there's also some concerns in terms of availability and some of components. Any concern for the future in any of those? There's mention of memory, for example. Any concern? Yeah, today in 2026, there's definitely a supply shortage or a lot of consumption as companies build up their AI infrastructure. For cluster, what we typically look at is we know this is going to happen. We have seen it at COVID. This is today AI infrastructure build up. When we design hardware, when we qualify hardware, we also make sure that we have at least two vendors for each of the components or each of the design that we have. So that makes sure that helps and ensures supply continuity. But beyond the two vendors, as the need comes around, we can qualify more. We work very closely with our supply team, our logistic team, to make sure that we have a pulse of if we need server in six months, how are we sourcing those components? We have direct relationship with the memory vendors, Samsung, Micron, SK Hynix, and the likes, SSE vendor, NIC vendor, PSU vendor, to make sure that we have visibility into securing a supply for our capacity. So it's a lot of planning and a lot of collaboration with our partner teams, as well as partner vendors, to make sure that we continue to be able to fulfill our capacity demand. So planning with time is really important and it's on the basis of the team's work. And COVID is definitely an example of that in many aspects, which is interesting as well, right? Yeah, COVID, AI infrastructure buildup. What we learned from it is that we need to plan further and we have better relationship with vendor and not just signaling what we are building now, but also signaling what we're thinking for the future. We would like to see technology improvement in this area, that area, or signal to them, 2026, this is expected capacity, 2027 is expected capacity. Help us plan better on your end as well. We can work, collaborate together to secure a supply as much as possible or design new components that help improve efficiency, not only at Cloudflare, but to all their customers as well. And not only that, we also have to be very involved in where the industry is going. We have to understand industry trends. We have to know where, what's the next big thing? What is the thing that people are talking about? What are, today is AI. That's the buzzword. So what is it that is going to drive the industry is going to help us decide what we're building and what to expect in the coming years. Exactly. And in our case, we have different products with that. And with that growing, it means a worker's AI, a bunch of products that were already mentioned here, durable objects, sandboxes, those will need specific use cases. If people are using those more, you need to adapt to that, right? Correct. Building something that is a little bit modular, like the Gen 13, where you can add in front drive if we need to, add in a GP when we need to is critical. And our aspect that we think about also, and recently with Google announcement that Quantum Day is sooner, we have also in our Gen 14 roadmap, also now included to make sure that we are quantum compute secure or quantum ready on the management layer, in addition to data plane layer, where Cloudflare has committed since 2022 or earlier that everything needs to be post -quantum cryptographic ready. So we are also looking at a hardware layer to make sure that the hardware have native post -quantum support as well as the firmware is post-quantum secure to make sure any threat actor in the future cannot break into it with quantum compute capability. That's quite important. I did a full episode last week with Bas Westerbam from our team about that. And he's a researcher and usually is not the concerned type of person. Now he is because quantum computers are definitely with a due date in a sense for 2029 earlier than expected. So the hardware there is quite important. And it's no longer fictional. It's becoming real, really quick. Exactly. This was great. Thank you, Victor. Thank you, Chike. Thank you very much, Joe. It was a nice conversation. And that's a wrap. It's done. Thank you.

This Week in NET

Tune in for weekly updates on the latest news at Cloudflare and across the Internet. Check back regularly for updates. Also available as an audio podcast!

Watch more episodes