Hardware at Cloudflare (Ep2)
Presented by: Rob Dinh, Brian Bassett
Originally aired on December 9, 2021 @ 9:30 PM - 10:30 PM EST
Learn how Cloudflare chooses and designs hardware.
English
Transcript (Beta)
So, thank you for tuning in at Hardware at Cloudflare. This is episode 2. I am Rob Dinh, an SRE engineer, and my co-host Brian Bassett, hardware engineer.
Here to continue with our episode 1, we did not get to answer all of the questions that we had received.
So, this is what we're going to be doing first and foremost here at this episode.
Talk a little bit more in depth of some of the things that we've missed, and also very welcome to answer any more questions that you guys have.
So, if there's anything you want to talk about, feel free to email at, what is it called?
Live studio at Cloudflare.tv. So, let's go ahead and start. I feel like if we keep doing these, we're going to need some theme music.
Maybe. What would be like a music that would represent Cloudflare or Hardware, right?
Something 80s themed, I think.
Yeah, I think I'm getting pretty good with the introduction. We might actually have to have our own nicknames or something.
Yeah, that'd be pretty cool.
Gillo Jocko, Joe Rogan Experience. Well, anyways, let's see. So, last time we talked about what is Cloudflare looking for hardware, right?
Now, we described that most of our servers at the edge are cache servers, and we were just trying to suspect those servers so that they can handle, you know, cache with minimum latency.
And, you know, some of the questions that we've had was more, I guess, tactical or more operations management for hardware, which is sort of like, you know, what are the power concerns that we're looking for?
How do we treat FRUs for each of the nodes?
You know, some decisions like, why didn't we go for 64 cores instead of, you know, we went forward with the 48 core?
So, you know, if Brian, you could, you know, get more details on those questions that we've had last time.
Yeah, we touched on the 64 versus 48 cores a little bit last time. You obviously do get more requests per second serviced out of a 64 core processor than a 48.
But one thing that was interesting was that it didn't scale linearly.
And I think part of the reason why, and we also talked about this in the last episode, is that the level three cache is shared among all the cores.
So the cache to core ratio is actually lower on a 64 core part than a 48 core part.
So the performance per core went down with the 64 core part.
I think that's not exactly a surprise.
You never, you'll very rarely see workloads scale completely linearly with number of cores.
But up to 48, it did scale pretty linearly. We also tested the 32 core part.
And then we started to see a bit of a drop off. The other aspect of that is that there's, like with any CPU vendor, there's a price premium for the top of the stack.
The highest skew part is always per core a little more expensive than the ones further down the stack.
So for us, and it consumes more power.
So for us, we were taking all those factors into account, the request per second per watt, request per second per dollar.
And that's how we landed on the 48 core as kind of being the sweet spot for that current generation.
When we reevaluate the next generation of CPUs, we'll certainly test the top of the stack and the one that's equivalent to today's 48 core and probably one bin below that and go through that analysis again.
But that's why it's like the question is a really good one.
And it seems like counterintuitive that you would go with fewer cores than the server, given that your costs for all the other components are fixed.
So when I get as many cores in as possible. So hopefully that kind of gives some insight into the process that we went through when we landed on the 48.
Yeah.
If I were to get real back more into the past, before gen 10 was happening, the understanding that we've had without as many metrics or without as many testing was that performance was just going to scale linearly as based on the number of cores.
So when we went from 24 to 48, we just assumed it's going to be 2x. And yeah, it wasn't exactly 2x.
But we just sort of had, with the lack of understanding that we've had, if it was 1 .7, 1.8, we just said, well, that's just reality, right?
In theory, it's 2x. But yeah, we'll just throw it out into reality.
That's how it was. And I mean, we did come from an Intel 24 core to another Intel 48 core.
And without diving too much into the details, it just seemed like it was going to be a very linear thing.
So it's great that we have figured out that L3 cache was part of the equation for what is going to affect performance.
So it's one of those where, okay, now we came from gen 9 to gen 10, 48 core to just another 48 core, really.
The only difference is now is that we have the number of sockets that are reduced and increased L3 cache and ended up to what, theoretically, 36% more RPS, which is requests for watts.
So yeah, just wanted to leave it at that. A lot of people were saying, is it really innovative that we were doing from gen 8 to gen 9?
All we did was just doubling the amount of cores. But we also didn't double the amount of power either, right?
If you're just trying to make the math easy, the core was 2x.
The request was nearly 2x, 1.8. But the power was not 2x at all. It was more like 1.5.
So when you use that scale, then we figured, yeah, that makes sense.
That goes more towards what we're trying to do, which was to extract as much performance as we can in a server, shove as many cores as we can without having too much of a power penalty.
You're probably too modest to do it yourself. So I'll go ahead and plug it for you that you wrote a really good blog about this topic at blog .Cloudflare.com that went through the process and has some good measurements of performance scaling and power scaling.
So if people are more interested in that topic, that's a good place to look.
Yeah, yeah, yeah. Well, I guess. Maybe I just totally forgot about the details on what we did.
But yeah, the blogs are there for record.
And anybody else is obviously, please be welcome to read them through. Now, we did think about trying to make some demos up here, like live on a podcast.
Some of the things that we do actually might run a course of about 30 minutes.
So we can always fill that in. Maybe sometime in the future, it would be pretty great.
The list of benchmarks that we do is, it's basically just a script that's full of open source testing that we can do.
So a lot of things are just things that you can find on GitHub.
And the script, all it does is basically just a make file that downloads everything from GitHub, clones it, makes the files and then conducts a series of bash commands, spits out a CSV at the end for in 30 minutes.
And we can do that for all the CPUs. And it gives us a nice sort of benchmark without using our Coughlear staff, right?
Because it's just straight CPU, stress test, measuring power. And that's it.
Maybe going back to my modest thing is like, this is not my tool either. Somewhere in the company, people are saying it's robbed CPU benchmark.
It's not at all.
I just took it from Vlad. Vlad is the guy who wrote that blog about arms take wings.
No. Was it how it goes? Yeah. Arm takes wing. Yeah. Arm takes wing. So he was the Godfather of it all.
Of all the news and perceptions that we have about Coughlear going arm, which we still do, it's more of a, is there a solution that's out there for us?
It all started with Vlad. And he was the one that actually came up with the CPU benchmark.
And I just took it from him. Yeah. That benchmark is, it's not using any Cloudflare specific code, but it does the sort of things that we do at Cloudflare, like compression and encryption and some go, some LuaJIT.
And it's really useful for just to set a baseline.
You never know until you get to production really how a CPU is going to perform, but to set a baseline and compare sort of per core performance from one CPU to the next, it's really useful.
And we're using it for the gen 11 that's in design now.
We'll be using that as our first step in the benchmarking.
And then after that, get to web traffic simulations. And then if whatever top candidates come out of that round of testing, we'll go into production testing and we'll see what they can really do as far as requests per second.
Yeah. Just to clarify something, and we might've actually talked about it last, the last episode.
The CPU benchmarks that we have that's on GitHub, it's available.
You can always reach out to us if you, if somebody that's a vendor that's listening to us wants to use it.
And it works for any kind of configuration, right?
We've had dual sockets, single sockets, Intel AMD, and the variety of ARM vendor CPUs that we have.
It just works. So it makes things a little easy. It's one of those things as a hardware engineer, you don't want to trust the marketing data sheets that are out there.
You sort of look at the PDF, say, okay, that's okay. We can test that, but let's really put it into the ringer before we actually apply all of our engineering and software stack into it.
So it leaves a great sense of data that we have that goes through the same test that's publicly available without Cloudflare's secret sauces.
And we can compare that across vendors. So it's great.
Go ahead.
Yeah, one thing that I had on my list to talk about last time that we didn't get to was about part of the architecture of the Roam CPUs that we're using in Gen X.
I can share the screen, just a quick reminder of what they look like.
So that's one of the servers there. And this is a close-up of the AMD CPU that's in them.
And just to talk a little bit about the way these are architected and something that was interesting for me.
So these CPUs have what are what they call a multi-chip module.
MCM is the acronym. You can see it in this slide.
And what that means is that rather than having one die with all of your CPU cores on it, which is the traditional way to do it, they have multiple dies, each of them having anywhere from eight to 16 CPU cores on them.
And these are all packaged on the same physical package. That's the package that you actually plug into a motherboard.
So what that does is it increases AMD's yield because they don't have to have dies that have so many viable CPU cores as you would if you were trying to get, say, 32 or 48 cores onto the same die.
What this means from the perspective of the user like us is that you can either configure these to all look like they're part of the same CPU package.
And the setting is configurable.
It's user configurable. It's called nodes per socket.
And you can configure that anywhere from one to four. Well, one, two, or four.
So what that does is if you configure it to NPS nodes per socket equals one, then to the operating system it shows up as a single CPU with all 96 logical cores on it.
What we found in our testing in production was that we actually got around 10% better performance out of NGINX when we configured that for nodes per socket equals four.
And I think the reason why that's happening is because you're actually giving the Linux scheduler more information when you do that.
You're letting it know that, hey, these cores are on this CPU die and this is the memory that's attached to it.
And the scheduler will then try to allocate memory to processes that is attached to the same CPU die.
And so it's not having to go through these interconnects to get to the memory that is allocated to a process.
So it was just interesting to me as somebody who's been doing this a while, I didn't anticipate that result.
I thought the uplift would be pretty minimal, but actually it's a lot.
Yeah. Intuitively I see one big node. So when we say node, we're talking about just the socket itself.
Before what we've been doing, I've always tied numerous regions by the socket.
So one region was that socket and then you have the second region as the second socket.
So it was a bit of some mind gymnastics that you can actually have multiple nodes inside a socket.
But still, I still think just intuitively without knowing anything, shouldn't one region actually just be more responsive?
I feel like if I added more regions, like two regions or four regions, like how we figure it out now, that communication between regions, doesn't that introduce latency?
Well, so at least for the way that we're doing it, Nginx spawns a bunch of processes, which we're pinning to CPU cores to try and keep them within the same module on this multi-chip module.
And then the Linux scheduler will allocate memory to those processes that's attached to the memory controller that goes to that module as well.
So it actually has the effect, and I agree, I think it's counterintuitive, but it has the effect of reducing the amount of communication that has to happen between those multi-chip modules.
And so the memory accesses would happen anyway, but there's more latency to them if you have to go across that interconnect and we're taking out a lot of that communication.
Also, I think it reflects well on improvements that have been made to the Linux scheduler in recent years.
I think that's been a major area of focus as multi-node CPUs, multi-node NUMA domains are commonplace these days.
And it used to be that leaving things up to the scheduler, you didn't get very good results, but it's getting better as time goes on.
Now, we did try not doing the pinning of processes to cores that I just talked about and left it entirely up to the Linux scheduler.
And we did get worse results that way. So there is still potentially a benefit to pinning the different parts of your web server to specific processor cores.
Yeah. Yeah. I think there's still a lot of things that we have going on the way where it's not like all services require the same amount of cores.
And we're still trying to figure out which service needs more cores or what other service could actually just be more bundled up into this group or this region of cores.
Yeah. In this case, I'm specifically talking about Nginx because that's the one that on these edge metals, that's the one that scales up to use all 96 cores to some extent.
And that's the one that really drives our CPU utilization on the edge.
Yeah. Yeah. Yeah. And there's definitely a lot more detail that we can talk about this far.
We should bring Ivan in here or something. He'll say something about Nginx FL for sure of how it behaves with cores.
Definitely somebody we need to pick his mind.
I'm just plugging Ivan out right now and calling him out to come out here.
Ivan from our performance team. Yeah. One of the gurus who helps us tune these things.
Yeah. But I think it is kind of interesting. I did mess around a little bit when we had some persistent memories to sample with.
And one of the configurations you could do was to assign this channel memory to a CERN NUMA in a CPU.
And that seemed like it was going to be something we could sort of explore, I guess.
Yeah. Yeah. So speaking of memory, one of the other things that I was going to talk about last time that we didn't get to was the fact that these CPUs in our Nginx servers support DDR4 memory up to 3200 megahertz.
But that's not what we bought for them.
We're buying 2933 megahertz memory. And this is another one of those things just like with the number of CPU cores that at first glance, it seems like you're leaving some performance on the table.
But it was another thing where we were taking price into account.
And 3200 megahertz memory for servers was relatively new at the time we were going through the process.
Dollars per gig was price premium compared to 2933.
And we found in our testing that we really didn't get that much uplift from the additional 266 megahertz of memory on the faster one.
Not enough to where it was worth paying the price premium to get there.
Yeah. Price premium is really the only thing that kind of holds us back.
You can go on Newegg or Frize or something and just look at how much it costs for an SSD compared to memory.
And being in a cash business, we're all looking for cash. SSDs are cash for us too.
And it doesn't make any sense for us to go for memory if memory is 10 times more expensive than cash per gigabyte.
We can consider the performance and everything.
And yeah, our perfect web server would be running all our memory. I mean, sure.
Let's have everything closer to the lowest level. But we just can't do that right now.
And I'm hoping for a future that we can, because I think that'd be super cool.
Or maybe something in a lab we can mess around with. But yeah. I mean, we've been preaching it and Ivan's been preaching it too.
It's the best server is the one that we can just get rid of all of our disks and just have everything in memory.
But it is too expensive. Yeah. I think we're a ways off from that. I do see a question that just came through the chat.
Someone was asking about NVDIMS, which is an interesting question.
Really for us today, we're not bound by the speed or latency of our cache, which is most of what we have storage on the edge for, which is for caching customers' web assets.
Especially now that we've made the move to NVMe. With its lower latency, it can handle some of those spikes better.
And as far as the just general hum of traffic, it's nowhere near its capacity.
So to go with something NVDIMS or Optane, there's a significant price premium on those compared to NVMe disks.
And we wouldn't see any benefit with the way we're architected today.
Yeah. And we did actually test and mess around.
We actually have some experience with Optane. So just to get some information, Optane comes in two different solutions.
And somebody from Intel should correct me if not.
There is this dotted line of level of caching. You have CPU on top, memory, and then disks.
And as low as you go into that in that pyramid, your latency increases.
So you do want to have everything as much as you can in CPU.
And that supports with our findings that L3 cache is very beneficial for us.
So Optane tries to blur that line between memory and disks. And the two solutions is that one solution is sort of a memory-based solution.
So the memory can become persistent.
So it doesn't lose anything. We're not messing around with all the cache algorithms that we have.
It just stays persistent. So it's basically a very, very fast drive is what it is.
The other solution is more SSD-based, as in the performance of it.
And what it does is it gives you memory that is not as fast as real memory, but it has the capacity.
So we're using the SSD's capacity and turn that into RAM.
And it's claimed to have RAM-like speed, but that's the difference. So you have one solution, which is more memory-based.
It's super quick. It's just like memory, but it's persistent.
And then the other one, where you take more advantage of the capacity of the disk and turn that into memory.
There's also a third configuration that I was never able to figure it out.
But it's basically a hybrid of these two solutions.
And that comes more into configuration once you get into the OS, more so than it's like pure hardware.
So we've messed around with both in our lab.
And yeah, I mean, some things were actually really promising. It's actually really cool that we can have memory that's persistent.
That seems to go towards our perfect server, as in having everything all memory.
But this time we can actually also have the benefit of having persistent if we want it to be.
But yeah, in the end, it's just sort of, we weren't exactly sure what's going to be the product method for NVDIMS and Optane.
It was still very new. It was still very flashy, sexy, and fancy, basically.
I totally forgot what the price of these things was, but I was pitched in.
It was actually very comparable. Yeah, that's the thing with it.
And you're right about that sort of hierarchy of speeds of where you store things.
The NVDIMS aim to sit in between memory and storage on that hierarchy as far as performance.
But they also sit in between on that hierarchy in terms of price.
And so they cost more than an SSD per gigabyte, less than memory per gigabyte.
So for us, we would have to be able to realize a performance advantage from that additional cost per gigabyte to justify paying extra to store things there.
And the way our workload is architected today, we wouldn't get a benefit from it.
I agree, it's cool technology, and it's something we'll keep an eye on if we ever identify a place for it in our hierarchy.
But the way it is today, it doesn't really fit.
Yeah, and it's also something that we have to look at as a whole system too.
So that's the follow-up questions about the question that we have about NVDIMS is, you know, does it make sense to build our architecture around like 128 cores per node and then load it with NVDIMS or just increase your cache per core?
I have a shallow answer for that, and it's really very big picture.
It's like, honestly, in the end, we don't really care how many cores there is.
We don't care about the frequency or power or whatever. Just give us what works and what you promise to think it works, and we'll go through the whole CPU benchmarking also.
We're open -minded to any kind of solution. So there could be something that is brand new technology where we'll just throw away the whole idea that cores equals performance, for example.
There could be some kind of system that makes more sense where we actually go down, maybe go down to 24 or 32, for example.
And because memory got really, really cheap, so we're talking about like an external force of, you know, the market is telling us that memory is not really, really cheap.
Yeah, we could go for lower cores and then have higher memory if it makes sense to us.
But I mean, I don't know. I don't know if that's the better solution or not.
And that's what we're doing, right? This is what we're doing with the hardware team.
We're out there to test things out. Yep. Yeah, one of the other results that we got on testing GenX that kind of surprised me was that the AMD 7642 processor that we ended up deciding to use has a rated TDP, which is thermal design profile, of 225 watts.
Basically what that means is that the processor will never consume more than 225 watts in normal operation.
So what does that mean?
The extra, the wattage that you get determines how much it runs at turbo speeds, which is basically above its rated maximum frequency.
More TDP equals more turbo for longer.
And so it's rated at 225. And when it came to us, it was originally configured to run at 225 watts.
But both of our server vendors shipped us a thermal solution that is powerful enough to run it at 240 watts.
And it's actually a configurable option in the setup for these servers.
So we tried that and ran them in production that way.
And we did see an increase in requests per second, which is not really a surprise because you're giving yourself that extra thermal profile to get extra turbo boost out of it.
But we also ended up getting better requests per second per watt that way.
So we're using that additional 15 watts and we're using it efficiently enough to where we ended up running that way in production.
Yeah, yeah. I think that's an AMD only thing, right? At least in our experience.
I have not seen that on any other manufacturers' CPUs. Yeah, yeah. Just a little weird tidbit is when we were trying to do power tests or stress testing the CPUs, we know the concept of TDP.
If anything, it's nothing more than just an indicator, right?
We just can't trust that it's really the max basically. We think it's more of a status sheet thing.
Yeah. I mean, it's not the max that the system will consume, of course.
It's supposed to be a limit on the amount of power that the CPU will consume.
And the reason why that result kind of surprised me was that the fans on the system are the other big thing that consumes power.
The CPU and the fans are most of what the power that a server consumes.
And when you get to the high end of the fan curve, the power consumed by a fan is a cube.
Yeah. It's to the power of three amount of power ratio to your fan speed. So when you get up to 225 watts, you're already towards the high end of that curve.
So adding an additional 15 watts of headroom on top of that, you're adding more fan RPMs at the expensive end of the curve.
So I was expecting that we would get, yes, more requests per second, but at too much cost in wattage that we would end up not using the feature of production.
So I was happy to be surprised that that result actually gave us better requests per second per watt.
Yeah. Yeah. And I'm actually not sure you can actually confirm this for me is fan RPM is actually more determined by temp sensors, right?
Not the actual load that's happening on the CPU. Right.
So the load on the CPU drives the CPU to consume more power, which makes it heat up.
And then the baseboard management controller, the BMC on the host ramps the fans in response to that CPU temperature.
And it has like a target top temperature that you don't want to go above.
And normally they'll try to keep the fans going fast enough to where the actual CPU temperature is somewhere around 10 degrees C less than that.
So that you're not right up on the edge of what you can run at without thermal throttling.
Yeah. So, yeah. And you're right about the cubic curve.
I think it's also true if you were trying to translate CPU load or CPU performance to temperature, right?
Like it's very easy for us to double the requests while having to double the temperature, right?
The temperature will only go up like a degree or two.
So for me, I feel like, yeah, might as well just do it.
Just run them as hot as you can. And because it's not going to go super hot and we will know if it does run hot anyways, right?
It takes an obscene amount of energy to get up to our CPU or even server operating temperature.
It's a lot. And I don't think we'll ever reach to that just by, you know, software or just tweaking around firmware.
But yes, the fan curves are an interesting thing.
So, you know, the dashes for the fans is that they're always going to be spec at like 15K or something, 15K RPMs.
But we know that it's not something we should try to do.
We should keep them at, you know, whatever is the steady state, which I think is like 6,000 RPM to 8,000 RPM.
That's where sort of each fan kind of consumes about four watts or something. Now, if you go from 6K RPM to 12K, so just double that, it does not mean that the four watt that was consumed at 6K is going to turn out to be eight watts, right?
It's again, it's a cubic thing.
So that four watts to double the RPM will turn out very easily to 25 watts.
And there's six fans in our servers. So that's, you know, that's something that we always have to keep in watchful.
And like who wants to have fans go at 15K RPMs anyways?
It's not like we drive our cars at 8K RPM all the time anyways, right?
So, you know, let's not do that. You know, let's not try to physically abuse our servers.
Let's run down some of our servers.
Is it? I was going to say one other technology that we investigated for Gen X is memory encryption.
So this is a feature that AMD has built into their chips, starting with the previous generation.
And that is something that we're interested in because we already encrypt data when it's in transit using TLS.
We encrypt data when it's on disk using LUX. So this is kind of the final piece of the puzzle is to encrypt it when it's in use.
And Derek from our security team and I wrote a blog about this, but the idea is that if someone gained physical access to one of our servers, then there's things that they can do that were, they sound kind of science fiction-y, but there actually are practical attacks that can happen.
For example, someone could put an interposer between the DIMM slot and the DIMM and then use that to read out memory.
So the contents of RAM actually doesn't disappear immediately from a DIMM when it loses power.
It stays there persistent for a few seconds. And you can actually hit it with a free spray and extend that to minutes or even potentially hours by just physically lowering the temperature on the DIMM.
So just to guard against the possibility that someone could do some of these attacks, we experimented with that memory encryption.
It's super easy to turn on.
It's an option in the BIOS setup for the server. And then you have to of course have it enabled in the Linux kernel.
And you have to pass a command line on the Linux command line that says mem encrypt equals on.
And then from then on, all your memory is just magically encrypted.
We're using what they call secure memory encryption transparent style, which encrypts every page, including the OS and allocated to processes and everything.
They also have a way that you can do it page by page, but we wanted to just do it for the whole system.
So it works. We proved that it works using some custom kernel modules to try and read out memory while it's encrypted and verifying that you get back noise.
And we tested it in production and we did see about a four to 5% reduction in requests per second from enabling that feature.
So we haven't rolled it out widespread in production yet, but it's an interesting feature that it's good to have available and the other CPU vendors need to match that feature.
Yeah. I mean, I was just going to say why not just put it to default on them, but yeah, four to 5% seems significant enough to actually weigh out the pros and cons.
But for us, I think we're pushing more towards to have it on, right?
Maybe just a matter of fact is that we can do to reduce the performance penalty.
Yeah. I think the four to 5% is enough to where a bigger picture needs to happen at the company between us and the capacity planners and the folks and figure out how it's going to work going forward.
Yeah.
And I don't know what some of the factors would do that, whether it's something that we're going to do.
Maybe it's a tweaking of kernel. I don't really know.
I'm just trying to think of, at least more in a hardware, like purely hardware standpoint, slots per channels or maybe memory speed itself or the type of memory that we need.
Maybe it's something we can play around there. I'm not sure. Yeah.
And I think for the next gen, we'll probably experiment again with faster memory and see if it offsets some of that.
And then I would expect the performance of the next gen AMD processor to improve on that.
But we'll see. Yeah. I think that's about it that we have a hashtag for gen 10.
Again, gen 10 has a lot of new technologies that we introduced.
And being one, going down to single software to AMD is another.
And the secure memory encryption that goes with AMD is a big thing that we have.
But we can now move on from gen 10. I think it's been hashed out.
Again, if there's more questions for us about gen 10, feel free to ask.
Let's go ahead and get through some of the questions that we have here. What type of service keys do you have per use case?
And we can break down those use cases.
For us, it's pretty simple. We have an edge server. And then a variety of servers that work at the core.
So you have your database, your compute. And we spec around each of those things.
There's a little bit of storage, but a lot of that storage has nothing to do with client storage.
I don't think so. Unless you can correct me on that one.
So the ones that we call storage boxes are the ones with rotational disks, which are primarily for our Clickhouse database.
And then we also have some that we call SSD boxes, which have a bunch of SSDs in them for things like Kafka.
And then there's straight up compute, which is running Kubernetes clusters for dozens of different services that make Cloudflare happen.
And those are really the big ones.
The edge is the big chunk of what we spend on servers. And then the rest is a mix of these core servers that are for customer analytics or serving up people's dashboards and things like that.
Yeah. Well, it's just a whole bunch of logs that comes from our edge servers.
They all get sent to our core database for us to take a look if we need to.
So there's a whole series of questions about that.
It's kind of a follow-up, but I think we can just bump it up too. So it says, what is the end-to-end time you usually take to qualify a SKU?
And I guess SKU is not just a server, but also maybe memory or disk or NICs too.
Usual benchmarks, steps for Cloudflare users in BIM qualifications and BIOS tuning tips.
What do you got, Brian?
Okay. Yeah. End-to -end time to qualify a SKU is I'd say anywhere from six to nine months usually.
A lot of that time is spent waiting for the next rev of the server to come in.
So we'll get our initial prototype, do our qualification of the different components and try to land on what components we're going to use and identify any problems with the BIOS or the hardware itself, any mechanical changes we want made.
And then we'll be ready to move on to the next phase of the testing.
And we'll go to our vendor and say, okay, those two servers that you sent us look good.
We want these changes. Now send us eight more in the next rev.
And the time for them to implement those changes will take a few weeks.
We get in that revision and go through a similar process of synthetic testing, validation, usually some production runs at that point.
And then we'll say, okay, we're ready to test these in volume.
Send us a dozen servers and send them to colos all over the world.
There's some time for that to happen, to get them up and running.
And then we want to run them in production in various colos for a few weeks before we say, okay, these are thumbs up, start sending us production orders.
One of the things that I found interesting when I started here coming from a company that just had three data centers is that Cloudflare with data centers all over the world gets different traffic patterns depending on which data center the server is in.
So the performance increase for GenX, for example, over the previous generation ranges from around 25% up to 35% depending on which colo it's in.
So that last step where we send them to different colos internationally is super important to make sure that there's not some outlier where it doesn't perform the way we expect in Frankfurt, for example.
Yep, yep. Going back to what type of server skews that we have for use case, our use case at the edge is really on the low profile at the colo itself.
Again, we have 200 plus colos out there and they all do different things.
We can do our best to try to group them in certain profiles, but we're not trying to optimize the hardware for any of that because honestly, we don't really know what the low profile is going to actually look like.
And that's something that just kind of grows organically too. And then you try to release and push a whole bunch of new services and products out there.
And the best we can do is to not design every use case, but at least the most general use case.
You know, we did try to send some of our Gen 10 out to production for testing in different use cases, but then we kind of thought that doesn't make any sense really, right?
Our list of colos that we have out there that we sent to productions are going to be seeing the most general use case that we see.
Quarter use cases, it's not something you want to build too much on, especially on an engineering standpoint where you try to make an impact across 200 colos, 95 countries.
Dwelling on one special colo, it's a little hard to justify.
I think, what else do we have here?
Any performance impact for memory encryption? I think we went over that for three to five percent.
How do you decide on a server which you use for testing?
And how do we test it at PCIe Gen 4? So, for the first one, we talked a little in our previous segment about how we use original development manufacturers, or ODMs, for these servers.
So, it's not ones that are necessarily generally available to the public.
We usually ask for some customization on them.
For example, on one of them that we put out on Gen 10, we asked for two of the fans to be removed.
We didn't have any components that those fans were going to be cooling, the slot that there was empty.
So, we saved money on the fans, power on the fans, got a bunch of cables taken out that we didn't need.
So, really, the reason we work with the ODMs is that we can, rather than looking at a shopping list and saying, okay, we want that one, we can look at what they have and say, okay, we want that one, but we want you to change it in these ways to save us power and money.
So, that's kind of what we decide on. It's all about simplicity, no extra pieces, as few cables and fans as possible.
Then, we just ask them to make those changes.
For Gen 4, our Gen X hosts are PCIe Gen 4 capable, but we didn't have any Gen 4 peripherals that we were looking to use in them at the time of testing.
So, we didn't test anything.
If we end up with something in the future, like, say, a SmartNIC that's PCIe 4 capable, then we would do that then.
Speaking of cache, have you guys tried out Intel's OpenCAS for caching?
So, OpenCAS, it stands for Open Cache Acceleration Software, which I think is, well, it's open.
I don't know. I don't know if we've delved into that.
Have you guys done it? Is this their technology where you can sit an SSD in front of a rotational disk to act as a cache?
I don't know if it's specifically to sit in front of the rotational disk or more to add on more with memory.
Okay. Yeah, I probably can't take on that question. I'm not familiar with what that is.
Intel does have a technology that you can sit an SSD in front of a rotational disk and potentially get lower latency for relatively small accesses.
The only thing we're using rotational disks for is for ClickHouse, which is kind of built to use them.
So, I don't think we would see an advantage there. All of the storage that we're doing at the core is on these NVMe or on the edge is on these NVMe disks.
And the stuff like Kafka that's at the core needs to be running on all SSD to get the kind of performance that we need.
Yeah. Yeah, I don't know. Again, one of the things that we have – I kind of wanted to highlight a little bit more on what we were trying to do when we rolled out for Gen10 was we also had a lot of different teams involved, too.
So, when we sent samples for Gen10 in different production – or in different production colors.
Before that, we also sent them to different teams that could find a way to maybe either retrofit whatever their stack is or seeing a whole bunch of new technologies in Gen10 if it's going to affect their work.
So, I think it's the first time we've actually done this. Admittedly, back in Gen7, 8, and 9, I haven't really communicated that much on how we roll those things out to SREs or security, for example.
It was just sort of like – we didn't actually pay too much attention besides making sure that the servers worked.
So, I got to give a shout out to the hardware team and Rami for doing that kind of thing.
I was definitely not able to do that kind of thing. So, yeah.
Going back to the questions here. So, what's your production lifecycle for GenX?
Three years or five years? I would just add like in four years.
More just sort of in line with what our typical warranty policies are.
But I have no say for it. I think that's pretty standard in the industry is that your warranty runs for three years, and then you want to try and get at least another year out of it after that.
So, four years is kind of a ballpark standard.
Although, if we had servers that were working and, for example, needed the capacity someplace else for the stuff that we're buying today, I could see us stretching that beyond four years.
Also, it's going to depend on how much improvement there is in Gen11 and Gen12 and Gen13 over the ones we're buying today.
Because at some point, you have to take that into account that the request per second per dollar is going to keep going down.
So, it may be cheaper over the life of the server to go ahead and replace one, decommission it that's lower on request per watt per dollar, or request per second per dollar.
Yeah. In an operation standpoint, I can't wait to decommission Gen6, for example.
It doesn't do us any good.
You look at the grass, and there's this number of Gen6s, and they barely do anything to deliver the code.
Gen9s and Gen10s are all over the place. It's like, why do we have those Gen6s?
But it's not like we're pressing off to try to do these things.
At least they're not a pain. At the time that you do Gen6, we're talking about a four-year-old server, so it's right about time that we're going to decommission them.
There's a whole logistics, operations, and work to do. When you have a server that's four or five years old, it's time to decomm, but it's located at one corner of the world.
There's a lot of fees, I guess you call it, a lot of work to do to just try to have that decommissioned.
Yeah, as an SRE, I can just log on to it and just disable it.
But when it comes to your financial spreadsheets, it has run down its course of depreciation.
It's still there. It's still in service, so there's that coordination to do.
Old servers obviously fail more, and sometimes when they fail and we look at it, oh, it's a Gen6 server, I'm not going to bother to fix it.
We're not going to try to push anybody to send a five-year -old model SSD that's probably out of line already.
We're not going to do that. We build around our stack to just ignore it basically.
It's always going to be forever under repair until it's decomm. That's the truth.
That's just how we're going to do it. But if there's going to be an RMA case that has to do with Gen10, for example, yeah, we have to turn those on.
There's a bit of a judgment there to do.
Gen7s and Gen8s, we kind of had a little bit of retrofitting things going on.
I think I talked about that last episode with SSD retrofits, because the SSDs that we had for Gen7 was just really, really crappy.
We were getting something that was data center grade, so it was like one Intel SSD to another Intel SSD, and Intel worked with us really well for those things.
When you talk about the fees of decommissioning a server, are you talking about actual money or are you talking about people's time?
More so people's time. There could be some fees when it comes to hardware recycling, for example.
Otherwise, it's a server that's already depreciated in your financial spreadsheets.
That's how I look at it. I could be very wrong, but I look at a server that's been depreciated for five years.
It's worth nothing. What is it up to me to fix it?
I'm not going to send a $100 disk and then have smart hands. They'll charge me another $100 for the hour to swap it.
It doesn't make any sense.
Yeah, we're closing in.
We have about a couple of minutes, so let's see if we have any more things to talk about.
There was one interesting question from last time.
It says, with Popsites in 200 cities, how do you provision servers, Kickstart, Pixie, FTP, etc.?
It's a good question.
The way it's done here is it's pretty cool compared to what I'm used to. Most of our servers and all of our Edge servers don't have a boot disk.
They boot to a leader node in each colo and download a base image of Linux, just enough of Linux to get Salt to run.
Salt is our tool of choice for configuring and managing servers.
The newly booted host, Assault Minion, talks to the leader and requests the rest of Linux to be downloaded.
You can actually get on the server and watch during this process and see packages getting installed and the Cloudflare software stack getting copied down.
All of that takes around 10 minutes. The advantage there is that there's no configuration on each server to manage.
They're stateless.
Any changes that you make that we make in our central code repositories and propagate down to that leader get pushed out to each node without having to worry about a boot disk to change configuration.
In other words, we're not asking our vendors to do anything special for us.
Just give us a bare metal and we'll ship it out there.
We'll figure it out. Yep. Yep. Figure out how we can just wrap these things up.
It's up our time now. We are done. This is going to be a wrap for episode two.
Thanks for tuning in, everybody. If you have more questions, please go ahead and send them towards our direction.
You're free to text if you want.
This is it for us. This is maybe another one. Maybe we can have another episode, but we'll see how it goes.
We're actually trying to expand the idea.
We're going to talk more about power, some power engineers. There's going to be some network or network engineers.
We'll see how we can fit that into Cloudflare TV.
I'm Rob Dinn, and this is Brian. Thanks for tuning in. Thanks for watching.
that are infected by malicious software programs called bots.
A botnet is controlled by an attacker known as a bot herder.
Bots are made up of thousands or millions of infected devices.
These bots send spam, steal data, fraudulently click on ads, and engineer ransomware and DDoS attacks.
There are three primary ways to take down a botnet by disabling its control centers, running antivirus software, or flashing firmware on individual devices.
Users can protect devices from becoming part of a botnet by creating secure passwords, periodically wiping and restoring systems, and establishing good ingress and egress filtering practices.