🌐 Designing Cloud Servers for Energy Efficient Performance

Presented by: Nitin Rao, Atiq Bajwa

Originally aired on August 24, 2021 @ 2:00 PM - 2:30 PM EDT

In this segment, Atiq Bajwa (Ampere ) and Nitin Rao (Cloudflare) will discuss how CPUs can be designed to consume less energy.

Read the blog posts:

Visit the Impact Week Hub for every announcement and CFTV episode — check back all week for more!

English

Impact Week

Transcript (Beta)

Hey everyone, welcome to Cloudflare TV. It's Impact Week. We're talking about the Internet's use of energy and a big portion of energy around the Internet is used by processors. I'm joined today by Atiq Bajwa, who is the CTO and Chief Architect at Ampere, which is one of the most interesting processor companies in the world. So really excited to have this conversation. Atiq previously ran product architecture at Intel, so just a terrific vantage point and has worked in several generations of CPUs. So Atiq, thanks so much for joining us. This will be a ton of fun. We really appreciate it. Thanks for having me, Nitin. Very excited to be joining you on this very exciting journey that you embarked on. So Atiq, we'll talk more about Ampere, which is just a fascinating company. I'd love to begin by just learning more about your background before that. So you were at Intel for many years. How did someone become a CTO of a chip company? That doesn't seem an easy thing to do. Well, I started in the industry, Nitin, many, many moons ago, 30 plus years ago, back when the computing industry was really kind of the Wild West. There were many architectures, many of them were proprietary to actually system vendors like DEC and Prime and others like that. And then there were many merchant vendors as well. You might remember Motorola, Intel was there. And National Semiconductor actually started out at National Semiconductor, was doing processors there and moved to Intel, did a different architecture there in the 1960s, then was involved in the first Intel out of order x86 processor. And so then along the way with Intel, I did a bunch of different things within, including networking and business and stuff like that, mergers and acquisitions. But I came back to my first love, which is processor architecture. And as you mentioned, the last 10 years or so of Intel, I was doing that. And over, you know, after about 10 years, I basically retired. And then, you know, you think you've done everything, you've done some very cool products and it's time to pass the baton. But then when Rene came along and I said, hey, I'm going to start up a new outfit that's going to be focused on cloud and optimizing for cloud, that really kind of was intriguing. And, you know, it's about the mission was exciting of actually building something that transforms computing and, you know, tunes it and optimizes it for cloud. But then, of course, it's also about the people. There were some great people that we started out with, Rene herself, Rohit was my partner on the engineering front, and other folks in the team are just a very exciting team. And the opportunity to go after this mission with these people and to create a new company in this area was just an opportunity that I couldn't pass up. And so that's how I ended up at Ampere. That's amazing. And building a silicon company, a chip company is hard. There's a reason why there aren't too many of them. It's sort of relatively easier to build an app, but to build a chip company is really, really hard. Can you, it would be, we'll talk more about Ampere's 128 core CPU, which we're super excited about. I know a number of folks are really excited about, but I'd love to start with the first processor you worked on. Can you describe what was the first processor you worked on? What were the specs of the CPU, if you remember? When I first came out of school, I got into the industry. It was a huge number of cores, one, and it was multiple chips. I think each chip had roughly maybe 100,000 gates on it. And if you contrast that with, for example, the 128 core chip we're talking about, that's tens of billions of transistors on it. So things have advanced since I've been in the industry a little bit. Yeah, I added a couple of zeros there. That's amazing. And it's also become a great deal more power efficient. That's right. That's right. And I think that is a factor of both the work that has gone on at the architectural level and at the circuit and other levels of the design part. But also, I would be remiss not to recognize the folks on the silicon process side who have really advanced over the decades. They have really allowed us to pack many more transistors in, get more functionality, but within a power envelope that's essentially maybe not shrinking, but it hasn't grown at the same rate as we've added the capability. That's amazing. That's amazing because for what processors do for our phones, for our servers, for all the devices around us, but also what it does for the customers. The world is so much more productive because CPUs are more productive. So it's really amazing. And so when you're building Ampere, what's your vision, transforming computing? What is a bold mission? What's your vision for Ampere? What does success look like? Yeah, I think a large part of it for us is about building products that deliver the best performance and energy efficient performance. Because the cloud is all about delivering capability to thousands, millions of people. And you guys are on the forefront of that, especially with all your free services that you have. But that's all great and you have to scale, you have to provide performance in order to be able to provide those capabilities to all those customers. But you have to do that while being energy efficient, being both from a cost standpoint, but also increasingly, as we know, it's also about our responsibility to the planet, to the rest of society, to our generations that come after us. So energy efficiency is a big piece of what we're focused on. And in that context, what we're doing is because we're optimizing for cloud as opposed to taking something that's optimized for, say, a client and then making it work in a service context, we're really optimizing for cloud data centers. So that allows us the degree of freedom to be able to jettison stuff that might be very useful for a phone or a laptop, but isn't really the right thing and therefore is not efficient from a cloud data center standpoint. So a lot of our effort is actually on exactly that. Take the performance up because the more performance, the more people you can serve, but do it in a cost-efficient and energy-efficient envelope. Now for the, I think everyone's seen a laptop, everyone's seen a phone, not everyone has seen a CPU. Can you just like help us understand, like, what does the CPU look like? How big is it? How heavy is it? So the actual CPU chip, I sort of brought up an example of it, but it's I would say, you know, it's about this big. So it's the actual chip itself is probably the size of your thumbnail. But then it's put inside a package which is about the size of about, you know, maybe an inch by two inches, that sort of range. And then that little piece is where all the compute is. And then that gets put onto a board, which you guys build and deploy. And then, you know, and you add memory to it and disk and all that sort of stuff. But the compute itself, from a volumetric standpoint, is a very tiny little piece of the whole data center. It's amazing that like, I mean, maybe it's like the CPU is the brain of the server and the brain of the cloud. And it's so small and yet so, so powerful. So that's just really, really cool. That's fascinating. And so how do you see that changing over time? Like, does the sort of form factor change over time? Like, how do you see that evolving in the coming years? I think that's a great question, Nithin, and the history of computing and, you know, especially semiconductor, microprocessor computing is really about integration. And that's been enabled by this thing that you probably everybody's heard about called Moore's law, which basically says over time, about every two years, you can double the density of what you can pack into the same area. And that's been true for many, many years, and its demise has been forecast for many, many years also, but it seems to keep going. But that has allowed us to pack more and more capability, more cores, so 128 cores. It enables us to put more I .O. into that, more memory capacity and memory interfaces into the same area. So I see that that will continue over the next little bit. I don't think that that stops. You know, there are more advanced silicon processes that are coming online, so that will continue. We'll continue to integrate more. Power efficiency will require both work from the silicon process side to make the transistor more power efficient, but it also requires architectural innovation to make sure that when you're doing work at the building a processor, you're not wasting work. And so there's that efficiency architecturally that needs to come in. From a footprint standpoint, the processor, you know, they might, you know, go grow or shrink a little bit depending on the particular application. But I don't think it's going to get a lot bigger. What we're doing is continue to pack more and more stuff in there. Then that's amazing. We're we're I remember, you know, growing up entire generations, I've grown up seeing the seeing the sort of the the Intel imagery, which helped helped a lot of folks understand what a CPU is. But there are also processors in phones. What's what's the difference between a server chip and a phone chip? Are they similar? Are they different? Can you can you help sort of explain that? Yeah, sure. I think if at the core of it and forgive the pun, it was not intentional. At the core of it is the CPU core, which is a and essentially think of it as an engine. And it's doing all of the kind of the processing, the the the work, if you will. The differences between what you would have in a phone versus a server, first of all, in phones, you might have a handful, of course, you know, today you have two cores, four cores, eight cores, sometimes on laptops, you might have something a little bit more, maybe 16, the extreme might be even up to 32. But in the server space, as you know, it's, you know, 128, we think is a new standard. You really have to have 128. Certainly the number of cores or threads that you need to be able to run is in the 100 plus range. So that's one space that you have to be able to have that many cores. And then everything around that needs to accommodate that. So you have to build a infrastructure that supports providing, connecting all of these cores together, because they'll be sending data back and forth and cooperating and stuff and connecting them to the memory and the IO, you know, so that you can actually do the relevant interactions with the rest of the world. And so it's a little bit of a poor analogy, but an analogy would be, you know, if you're building a two stroke engine, you know, if you're building a motorcycle, your your engine is smaller, your transmission is simpler, and you're more worried about the weight of it and the balance of it. So that's what the phone might be. The engine itself is very sophisticated, but I don't mean to imply that it's actually in any way trivial. It's very hard work to make it that efficient. But when you're doing a large thing that has, you know, in our case, I was going to say eight cylinders, but this is 128 cylinders. Well, that's a different class of beast. And the optimizations are very different. You have to worry much more about, you know, making sure that each one of those is optimized for very low power consumption and while still operating. Whereas in the phone, you know, your phone goes to sleep sometimes. So you have to be very thoughtful about that and put it to sleep. In the server, you're hoping that your server, your core is doing work all the time. So you have to make it efficient even when it's doing work. So and then, as I mentioned earlier, how do you connect it together? You know, caching hierarchies are different. All of that stuff is really a different beast. And just to just to complete the picture, we spoke about phones, we spoke about servers. I guess if you were to go further down the spectrum, you have you have HPC and supercomputers. What's the difference there? Like how do how do what's the difference between designing a server CPU and a supercomputer? Well, the interesting thing is that if you look at what the what's at the heart of a supercomputer, it's also typically a a processor that might actually end up in a server as well. The differences typically end up being that you're in supercomputers, you're much more interested in sorting point performance. So whereas a cloud server processor is much more interested in integer processing and and running a normal operating system in a cloud, in a supercomputer, you're really running lots of vectors, very wide vectors and floating point is matters a lot. That's less of that's not the critical optimization point for cloud use for cloud workloads. So and the other part, of course, is that supercomputers are built out of typically building blocks, which might be each of the building blocks is essentially a server. And then you connect all of them together through a fabric to talk with, you know, sort of at a high, high bandwidth, low latency kind of a network. OK, well, I really appreciate your demystifying this for me and our in our audience. And and so this is this is this is really interesting. So so Ampere's designed a 128 core processor. And and and and as the CTO and chief architect, you you have you have unique visibility and you're shaping the next several years. So what what is to the extent you can share, how do you think about roadmaps? Is there like what is what is the time frame like what is the time frame that different teams are working on? Like who's working on three months? Who's working on three years? How do you organize, I guess, all the R&D teams and who focuses on what? Yeah, so that's a great point. And, you know, as you know, our Ampere's focus is on delivering products on a annual cadence and delivering substantial improvements on that on that cadence. And so it really does require teams that are focused on specific products. And we do share a lot between the different products. But we also have some, you know, in one generation, we might say, OK, we're going to do X and a team for X. X might be a very complicated function. And so it can't get done in 12 months. It might require more than 12 months. So the teams might actually start well before, you know, maybe a year, maybe two years. Sometimes some of these complex ideas take actually three or four years to develop. So what we have is essentially a team that is sort of divided up into folks who own particular IPs and are marching to the cadence that they're supposed to execute to. And then basically they come together and we pull them together, all those things together into a product. And there are teams for each of those products in an annual cadence. That's amazing. And with every generation of these improvements, it's just like we continue to be amazed by the energy efficiency improvements, both for every CPU and really across all of them because they add up. So to the extent you can share, what has been the biggest driver of this big sort of energy efficiency improvement from chip to chip? So that's a great question too. And I think there are kind of two or three different pieces that go into it. Clearly the silicon process, underlying process, as you go to more advanced nodes, things shrink. That means the capacitance is smaller. Therefore, you're not charging and discharging and wasting energy along the way. So that's silicon process is an underlying capability that we have. But beyond that, what you really have to do is you have to do a bunch of work to characterize the workload you're going to work on, that you're going to be optimized for. And then basically based on that, deciding how much power you're going to spend to enhance the performance of those and how you're going to actually reduce the power of those areas that are not going to be used in those workloads. So a lot of work goes into what I would call microarchitecture optimization for power. You're spending a lot of time measuring the power as you're building the chip and then sort of figuring out, OK, well, that's more than we want. So you figure out how to optimize it out. And then the last part of it is actually there's a lot of circuit techniques that we use where we're actually looking at the actual transition level implementation we can use that you can make them more efficient by when you are not doing work, how do you think as simple as clock gating, for example, areas that are not doing work. But more advanced techniques as well are used in order to reduce the amount of power you're using. It is really, as I mentioned earlier, for Ampere, it's not just about performance. We're not we're not building a processor that is going to go into something that requires a hydro dam right next to it to power it. That's not the because we're going into the clouds and there are lots of our processes out there, we have to be very power efficient. And so at every point in our design process, we are looking at performance. There are metrics and every checkpoint we look at what's the performance, what is the power we're spending for that amount of performance and is that justified or not. So we make decisions based on that sort of stuff. And when I'm talking about that, I just want to mention one thing that my team wanted to mention to you guys is that you guys have a benchmark suite called CF Benchmark, which is and we actually use that internally within Ampere in development of our an optimization of our processors. So we use traces from those benchmarks to decide how to make tradeoffs. Should we do option A or option B and which one will be which one gives us more performance, but not just performance, but performance for what? Yeah, every time I chat with performance engineers at Ampere or Cloudflare really across industry, for one, it makes me realize just how little I know. But it just feels like there are all these secret knobs and buttons in CPUs and you're slowly uncovering more of them. You as the CTO, of course, know every single one of the knobs. So how do you decide, I guess, sort of which defaults you set versus which defaults the customer sets, especially for things like sort of performance versus energy efficiency tradeoffs? Who decides that and how do you decide what to expose and what not to expose? Well, in general, what we like to do is, we like to characterize the best of our ability. What I'd call is the sort of the best of knobs that, as you said, there's a lot of knobs. It's everything from what logic will you suspend, what memory refresh settings will you set, all of that. There's a whole bunch of different knobs. And we try to come up with one that we think works well for the broad set of sort of usages and customers that we have. But then, to the extent that we work closely with customers, we are able to then identify which knobs are actually more valuable to them to be able to tune. And then those we will expose and then you can set them as you see fit. Our general approach is that because while we are cloud-optimized, and the cloud itself is a broad market and different customers have different needs. So we don't try to be the know-all and say, hey, this is the answer. You should have it. We try to provide as much what the knobs, but also as much explanation of how those knobs might behave in order for customers to make their own calls. So now I realize that different processors are different. And of course, the software that runs it varies. But talking about energy efficiency, are they just general themes you have seen of things developers can do or sort of knobs they can press to save energy? So what's the secret that everyone should know of the buttons to press to save energy today? If you're in a company where you want to save energy on your servers, what's the missed opportunity that buttons were not pressing, things were not doing? That's a great question and not an easy one to answer, as you can imagine. I think the biggest part of that would be we, to the extent that you make your code, the fact that you can take your, if you're a user level app, the fact that you, to the extent you can compile it with all the optimization knobs to get the best performance, you're probably also going to get the best for what? Because you're doing the least amount of work and our infrastructure at the silicon level is tuned to optimize the performance and the power at that leading edge. So to the extent that you can optimize, do whatever max optimization you have available with the compilers and other capabilities, that's probably the best that I would say. Beyond that, as a user level app, I'm not sure that there's as much capability to, especially because as you're a cloud service provider, you have many different tenants sitting on the same machine. So a particular tenant can't really, they can only make their product more efficient by making it more performant. I think as far as the, a lot of that work actually has to be done, I think at the system level by the cloud service providers. Yeah. And even as cloud service providers, I think we have an opportunity to just provide a feedback loop of, I think folks understand how many CPUs or vCPUs they're using. I'm not sure if the average person understands how much energy they're using. So for example, this week we launched a carbon dashboard to help customers understand that, because if you better understand, it's sort of like, I guess the residential version is putting your energy meter in a really, really visible place. So you know how much power you're using. Like if folks have that feedback loop and they see how much power they're using, that can be an impetus for improvement. And is that meter, Nitin, is that meter also available on the Ampere-based instances that you're putting into your database? Or is that across your entire? So for Cloudflare customers, we're helping them understand just how much, what their carbon footprint is of their use of Cloudflare services. And what we find is just common is everyone, like there's a genuine interest in using more renewable energy and using less energy where possible. I know we're coming up on time. You have such a unique vantage point about seeing sort of really the future of the processor industry. So what should we expect from Ampere in terms of future products? Can you give us a sense for where you see the processor industry trending forward in terms of both performance and energy efficiency? Yeah, I think it's in some ways, I think it's going to be more of the same and not in the sense of more just linearly more, but I think kind of non -linearly more, especially on the energy efficiency side. I think, you know, as I mentioned earlier, performance is key because that's what enables people to get the work done in the cloud. We're trying to be leadership in terms of the energy efficiency, and that focus is going to continue for all the reasons we talked about. I think, you know, we'll be packing more and more stuff in a smaller and smaller physical footprint and in more, in about the same power footprint, which we try to drive down. So I think it's really, there's not a non-linear thing that I see. It's just, you know, good engineering that needs to happen going forward. We really appreciate what you do. I think the innovation that Ampere is doing is something that we as a customer are excited about. I think it's a real catalyst for us as an industry to become more energy efficient. And so we're big fans, and I appreciate your spending some time with us on this show. So thanks so much, Artik. Really, really appreciate it. Thanks, Vivek. No time to appreciate it. Good chatting with you.

Impact Week

Tune in for all of Cloudflare's Impact Week programming, featuring an array of CFTV episodes spanning environmental, social, and governance issues.

Watch more episodes