🌐 Chris Bergey, SVP/GM of Infrastructure at Arm and Nitin Rao, SVP of Global Infrastructure at Cloudflare — Impact Week Fireside Chat

Presented by: Nitin Rao, Chris Bergey

Originally aired on September 28, 2021 @ 11:30 PM - 12:00 AM EDT

Cloudflare announced improvements in performance per watt using Arm servers. In this Cloudflare TV segment, Chris Bergey (Arm ) and Nitin Rao (Cloudflare) will discuss the opportunity to help build a greener Internet.

Read the blog posts:

Visit the Impact Week Hub for every announcement and CFTV episode — check back all week for more!

English

Transcript (Beta)

Hey, everyone. Welcome to the newest session on Impact Week. We're talking a lot about the environmental impact of the Internet, and we figured who better to chat about that with than Chris Bergey, who's Senior Vice President at Arm. So thanks for joining us, Chris. Hey, Nitin. It's great being here, and exciting week of content you guys have thus far already. Now, Chris, you've had a really interesting career. Before we talk about Arm, would you mind just giving us a little bit of a, if you have a unique vantage point on the computing industry, can you give me sort of some background on how you got to do what you're doing? Yeah, it definitely wasn't planned, I guess. We all kind of took our career journey, but ironically, I think two weeks ago, it would have been my 25th anniversary of actually starting at AMD after engineering school. And so I kind of started in processors, but then spent almost 10 years at Broadcom, did some startups along the way, did some memory activities with SanDisk and Western Digital, and then landed back at Arm. And boy, I got to say, you know, in my career, I'm not sure I've been as excited to get up and go to work or, you know, go to virtual work right now. But, you know, what's going on around computing, what's going on around semiconductors, and just the way that that's fueling the world. I don't know that being in chips has ever been cooler, and just really excited about the opportunity of the market, and then getting to work at a place like Arm with the level of technology, and the culture, and the people, and it's just been a tremendous opportunity, and really excited to partner up with Cloudflare as well. I think you have just a fascinating breadth. I mean, these are a number of really iconic companies that spread across storage, compute, network. The question, before we talk about Arm, Chris, I was just curious, like, how well does the average person understand silicon, and just the world of silicon? Because I think everyone knows what a social network is. I think folks have gotten to understand search. Like, how do you explain to sort of the average person what this entire industry is about? I don't know. I think it's, you know, it's become so specialized in so many different ways that it is difficult, and I think, you know, ironically, you kind of see some industry leaders and government leaders trying to figure out what are chips, and kind of these things that fuel the a large portion of the economy. But it is, you know, like anything, it's really exciting to see the technologies evolve, and even when you go into these different separate, you know, verticals from a technology point of view, you actually can see a lot of the same trends kind of come through, and I think as a business leader or a product management leader, which is where I spent most of my time from a career point of view, you know, many of the lessons learned of kind of how does semiconductor economics work, and how do you need to think about applying those to real world problems, and just to kind of scale, and the NREs and costs associated with building these products that change our lives, but yet are, you know, quite frankly, really, really big bets, and it's just a fun place to be, and then now with the whole advent of AI and ML, and kind of all kinds of new compute paradigms, you're just seeing the innovation flywheel just going, you know, going very, very quickly. Help me understand more about Arms, so Arms and phones around us, servers around us, can you explain Arms vision and the journey you're out on? Yeah, so, you know, Arm was started, we just hit our 30th anniversary, and really the mission has been to provide powerful, efficient processing wherever computing happens, and that vision has resulted in, at this point in time, 190 billion Arm-based chips being shipped, and you can do the math of how many per person and all those kinds of things, but it's incredible, and it's a journey that we still have many people that started very early on on that journey in the turkey barn in Cambridge. Simon Seegers, our CEO today, was one of the early, one of those early engineers, and so what we really think about is, you know, really kind of what, with that kind of scale, I think much like your impact, we think about how to build a better future, right, and building a better future using technology, and sustainability wise, we're really focused on two things. One is closing the digital divide, and we do that by leveraging this ecosystem of, you know, our capabilities of building devices to try to close the gap between those who have full access to digital technologies and the 3.7 billion that do not, and I know Cloudflare made some commitments just yesterday around this same area, and then I think the other area we're focused on is, and this is going across the industry as well, is really how do we decarbonize compute, you know, how do we minimize the environmental impact of our technology to reduce the carbon impact of all of these devices on the planet, and we are trying to do that by leveraging our expertise in low power, and what we like to talk about is work per watt, and how do we make the world run very efficient from a work per watt perspective. That's great. So for hundreds of millions of people, their primary computing device is Arm. Can you talk more about that? Well, I mean, yes, I mean, I think Arm is very well known for being involved with smartphones and IoT devices, and obviously that's, you know, our essential parts of our lives with IoT and kind of connected sensors being this next wave of trillion devices, so that is definitely part of our DNA, and, you know, it's a big part of what we've leveraged, and we've leveraged those capabilities to look at, okay, now as we get into kind of customized compute or power sensitive compute in infrastructure, how do we leverage that scale and that capability, and as you can imagine, especially kind of one of the things I spend quite a bit of time on right now is the 5G buildout is, you know, when you're thinking about these 5G cell towers, and, you know, now we're talking about going to new spectrum that does not propagate as far as, you know, the lower bands did as we go into mid band or high band type things, you know, all of a sudden you want to start putting these compute elements in some pretty nasty environments and not something that you can have, you know, tremendous air cooling or you may not even want to have active cooling, so really if you look at the future of infrastructure, whether it's sustainability or just being able to do compute in highly constrained devices, our DNA and that is really, you know, proving to have a significant value in the market and, you know, as well as bringing the performance at those power levels as well. Yeah, and just divided by the, you're able to spread the R&D cost across such a large number of devices, such a large number of people, which makes it more accessible, and the number of people also means that there's a lot of energy use and therefore carbon, can you explain more about how, what Arm is doing to, Arm is inherently a power efficient architecture, but what is Arm doing to further reduce the carbon impact of all the devices around us, be it phones, servers? Yeah, so I think that, you know, I think, you know, from a CPU point of view, we start with the cores, right, and we're making very efficient cores and to continue to yet push the computing envelope. I think the thing that we've made, we've done most recently was really show that we could have tremendous scalability, right? If you think about the Internet, it's all been about scale out, right? And basically, how do you look at workloads and how do you make them scale out, whether that's for redundancy or whether that's due to cost flexibility and really kind of the fact that computing now is a service and what does that mean? So we've, in some ways, we've kind of done the same thing where we've looked at, well, how do we stick as many cores as possible and how do we make those as scalable as possible? And I think with the announcement that you guys made this week around deploying Ampere technology, leveraging Arm's Neoverse CPUs, it's a great example of it, right? You've got 80 cores there of computing capable. You guys reported, I think, greater than 50% power savings versus your previous generation, which is frankly a great number. If we can get that generation over generation, I'm sure you'd be quite happy. And then yet Ampere has already announced that they're sampling 128 core version. So if we can just keep driving those high core count devices with great memory scalability, performance scalability, that's what we're focused on right now into this space. And as I mentioned, you can try to fit your power envelope as needed. And so Arm is already so critical to the data center industry and also really encouraging all players across the industry to become more power efficient. Can you help paint a vision for the role you see Arm playing in the data center space over the next several years? Yeah, that's a great question, Nitin. So what I think about when I think about the cloud data centers is really, it's been a story of disaggregation, composability, and customization. And I think that is really what we've had to do in these computing environments to frankly keep up with this rate of computing of needs. And what that means is moving from kind of general purpose compute, which was a one size fits all for the entire industry to customizing compute and looking at how do we disaggregate storage? How do we disaggregate flash arrays for databases? And all of those kinds of things have been how do we disaggregate getting on the Internet like this Cloudflare's business. So really, to me, that's been the key. And what's powerful about the Arm business model and our ecosystem of partners is we build these very powerful, power efficient cores, right? And what we tell our ecosystem is saying, OK, that's what we're bringing to the party. You're an ecosystem partner of us. You're a licensee of ours. But take the things that you're really good at. And when you bring together, I don't know, my chocolate and your peanut butter, right? That's what makes the Arm ecosystem really go. So if a DPU, let's for example, a data processing unit is kind of a new hot topic around using what's called SmartNIC. And now we've evolved the data processing unit, which is really providing a lot of the virtualization backbone for computing. What's in a DPU? Well, you want to put a very performance CPU in there. So that's power efficient because you need to fit into a 50, 65, 75 watt power envelope because you're maybe a PCIe device. But yet, you need to do line rate at 100, 400 gigabit per second. Well, that requires a bunch of accelerators. And then, oh, wait, I want to put all these security measures in there. So it's really those are the building blocks that our partners are able to bring. And then we reuse that same CPU for a 5G access point or a 5G VRAN node. And so that's really what we see happening. And I think the fortunate thing for our partners is, for example, you talked about 40%, I'm sorry, greater than 50% gain using Immuniverse in your deployment today with Ampere. We already have our partner Marvell announced that they took our next generation core, what's called N2 in 5 nanometer, and they got a 40% increase in performance in the exact same power envelope as what you're seeing today on N1. So we've got another 40% coming to that 50% that you've already realized. So just this pipeline and this flywheel is just moving very, very quickly. And I just am amazed every day by what our partners are building. And we think this is how the future gets shaped. So for the average, I'm curious and for engineers listening in, what goes in behind the scenes, what goes into reducing the power of a chip? Well, I think there's many aspects, just like what happens in any kind of optimization. I think that one, we do still take advantage of improved lithography. So even though, yes, lithographies have slowed down, maybe from some of the earlier Moore's law trajectories we were on, and the costs are going considerably higher as we're getting to technologies like EUV, we still are seeing significant transistor gains, performance gains around these new technologies. But you also have to start designing specifically for these nodes, because maybe you don't get as much SRAM scaling, and you just get more transistor scaling. So that's one thing I think is really starting to do more and more design in that way. The second thing is really looking at workloads. So I think in the past, there's always been these benchmarks, and benchmarks are important. It's a nice way to compare things. And benchmarks try to be representative of real world. But of course, as we all know, the real world is changing very quickly. And so really starting to dissect as you look at these real world workloads and say, OK, and we work with partners like yourselves and others to really trace the, get it all the way down to the construction level. Just look at what's actually happening when this higher level function is happening. And throughout that stack, whether it's at the compiler level, whether it's at the library level, whether it's at the transistor level, and the way we build microcode, we're picking apart everything. And you find a few percent here, a few percent there. You add the whole thing together, and it ends up being a very nice product generation. The gains continue to be harder and harder to get. The teams say, we think we've gotten everything out of this generation. We go back to the drawing boards, and we squeeze more out of it. So just a pleasure to work with such an amazing set of people. And over such a large number of large install base, I mean, it really all adds up. That's amazing. On the power efficiency side, over the long term, what are the ideas you're most excited about that would really move the needle for energy used across devices? Well, I think, Nitin, I still think there's a lot of gains to be had. And I think a lot of those efficiency gains are around how do you build things that are able to deal with the peak performance, but yet manage the fact that you don't have peaks that often. So I think that, and this has been one of the things that cloud was good at, for example, was in the past, companies had to build for their peak, and yet they did not run there very often, because it depends on what their business was and what was driving that discontinuity. To me, that's where I see a lot of the future advantages. And I think you can see it in networking to start with. I think if you just look at how well work from home went, and the fact that nobody had modeled that you would have the workloads or traffic that we have today, it was because of things like SD-WAN or other types of technologies that were developed, and quite frankly, with Cloudflare, to do load balancing, and to say, okay, well, yeah, I can run this here, but I might want to run it there. And being able to have the resiliency to be able to move those things around allows you to keep pushing up that nominal case to be closer to the max case. And so I think that'll be where some of the technology advantages come from. I think, as I mentioned before, we're finding that many of these workloads can be accelerated via different types of processors. So you've seen the admin of TPUs, and GPU compute, and NPUs, networking processor units. So I think that's going to continue to happen as well, where you basically kind of this composability of you send the workloads to the specific accelerator that that workload is looking for. Also, there's a lot of discussion around, I would say, kind of being more tightly coupled. I think at the silicon level today, we see a lot of power in IO, whether that be PCIe cards or other types of... And so there's a discussion of, well, if we made the package substrate be more the motherboard, and if you brought all these technologies closer together at the package level, what kind of power savings could we see? What kind of performance gains could we see? And then there's obviously been... I've spent a little bit of my career in silicon photonics, and I think more and more the world's moving photonics. And as we know, light has some very interesting capabilities, but also some challenging ones. So I think also just figuring out how do we move more things optically will be another key driver for reducing power. Thanks, Chris. It's amazing how a number of these savings just add up over so many devices. One of the things we have seen at Cloudflare is just the efficiency of a shared network. So you just need fewer processors and therefore less power if lots of workloads run on a shared network. And it's also been really exciting to just see all the power savings with ARM. In some cases, we can actually just rip out all servers because they're consuming a lot of power, and it makes sense to replace them with ARM CPUs because you're just able to do so much more within the same energy footprint. So that's really exciting. What kind of new products are coming out to the extent you can speak about from the ARM ecosystem that you're especially excited about? Yeah. So just on that last point, I think you hit it on portability. I didn't even think about it, but you're right. If you look at cloud -native workloads and the way that people are writing those portabilities, and I think, for example, your workers platform is a great example of that kind of portability, that is actually helping a lot from an efficiency point of view. Sorry, just to jog my mind that I missed that opportunity as you guys are playing a big part there. We've got an exciting pipeline of things happening. I think earlier this year, we announced the N2 processor that I mentioned that Marvell has already announced their sampling. We've got a large pipeline of customers that are planning on making product launches. We'll let them make those launches. We also have our new V-series of CPUs that you'll start seeing, which are leveraging something called SVE or Scalable Vector Extensions, which is new to the ARM architecture, which we believe is vector processing done right, and really being able to be flexible in those workloads. Really excited about the performance pipeline that we've got and scalability and just what our customers are going to build with that. Also, the world of HPC has gotten quite interesting, whether it's the interesting things that have happened around drug discovery. ARM continues to maintain its number one position in the world's most powerful supercomputer, the Fugaku system. We continue to see a lot of interesting innovation going in in HPC from differing national efforts to build their own HPC chips, which are largely based on ARM technology, as well as many more established players, customers such as NVIDIA that have announced exciting products based on ARM CPUs. I don't know. I just love building cool products. I think we're lucky with what we get to do every day, Nitin, and I think we're going to try and change the world. I think you already are and super excited for the future with ARM. What a terrific lever to move computing forward, move power efficiency forward. If you're an engineer who's mostly burnt on the x86 platform, how hard is it to run your code on an ARM CPU? Where do you start if this is your first time? It's a great question, Nitin. I think that we've gotten a larger and larger footprint of cloud providers and other industries that are now offering ARM instances, so that definitely helps. But a lot of what changes is just cloud economics, right? Most of the CICD loops now, the larger ones, support ARM as a first-class citizen. It used to be a big commitment to go to ARM, to be honest with you, where it would mean putting a large capital outlay to buy a cluster or a set of servers and start running your workloads on ARM. In today's world, it's take a DevOps team, take them offline for a few days, and have them run their CICD loop and compile for ARM and see what you can get. I can tell you that this has been transformative for many customers. Just like you've seen the numbers of 50%, we see that day in and day out, 20%, 30%, 40%, 50%, 60% price performance or performance per watt kind of improvements. What's exciting to me is for some of these smaller... Look, any CFO is going to love that when you say, hey, I just got this huge savings and all I did was recompile. But for some of these smart companies, we're buying a more runway relative to their next round of VC or, hey, because you guys were able to give us all this extra compute, we were able to do these new features because we could stay and save the same cost envelope. Barrier to entry is lower than it's ever been before. We have tremendous partners doing a lot of great things in the ecosystem. I think for the most part, what people see is it's a lot easier than they thought and they're pretty happy they did it after they make that jump. I think you guys are also a great example of that. I often describe it to... Well, for one, engineers don't need a lot of convincing because everyone's excited to use ARM CPUs, but we were amazed by just the support we got from the community. And Cloudflare engineers have, for instance, also contributed as we've brought it to ARM. And so there's a whole ecosystem. And I look at the numbers, like the 50% improvement in performance per watt, and I think we can optimize it so much more. In many ways, these are the early days, so it's so exciting. Well, thanks so much for joining us, Chris, and really great to get a window into what ARM does. I really appreciate it. Thank you so much, Nitin. Thank you. And good luck on your Impact Week. You guys are doing some great things. Proud to be part of it. Great. Thanks so much. Thanks, Chris. When the server crashed and all hell was breaking loose, I woke up in the morning and said, hey, what the hell's going on here? Testing one, two, one, two. Can you hear me all right? So I will begin from the top. So tell us about the COVID Symptom Study App. The COVID Symptom Study App is an easy-to-use app that people download and they self-report their symptoms every day, whether they're sick or not. It allows us to predict who is likely to have the virus, which areas of the country are going to get most affected, and it also allows us to predict which people are going to get sickest and need the most urgent care. It's been an amazing success story, really. The app went viral, and within 24 hours, we had about a million downloads, and we now have over 3.3 million users around the world. Julian, you must have been excited about the success of the app. How did you handle so many people using it at once? At 7.55 in the UK, which is still pretty early, the database crashed. Most of the team is actually still sleeping, and at that point, we're like, what's going on? How is it that we're crashing so early in the day? We launched the app really more in hope than anything else, and then it just took off. When the server crashed and surrounded by media, we said, hang on, this has kicked off. We can't go back now. How are you working with Cloudflare to secure the app? I already knew some people working at Cloudflare from previous experience, and just told them, this is what's happening. We're having this great success. We could really benefit from not only your expertise, but your services. Can you do something for us? And by the end of the day, we were upgraded and part of the Galileo project. We're using it mostly for managing DNS, proxying traffic, but a lot of things at small scales, managing SSL certificates. We're also using some features to protect access to various parts of the website, basically the admin pages, and making sure that people are authenticated, coming from the company. Tim, what are you learning about this virus from all the data you're seeing? Data is power. Data is knowledge. And basically, there's been no data out there on what's happening to this virus in the population. We were able to use all the symptoms that people were giving us, and we've picked up through the app, lots of symptoms that people weren't recording before. We were the first large group to pick out the loss of smell and taste, was incredibly important. Pretty much every day, we're finding out something new about this virus. With this amazing data. How does it feel to be part of such a successful project? This project was never about having one million people. This project is about delivering value to the population and the research. So that's really where the satisfaction comes from. We just had the best team at the right place at the right moment. That's what I'm super proud about, that every one of them did their bit, and that's why it worked.

Impact Week

Tune in for all of Cloudflare's Impact Week programming, featuring an array of CFTV episodes spanning environmental, social, and governance issues.

Watch more episodes