🚀 Introducing Cloudflare R2 Storage

Presented by: Jen Vaccaro, Greg McKeon, Rita Kozlov

Originally aired on May 23, 2023 @ 5:00 AM - 5:30 AM EDT

Join the Cloudflare developer product team to discuss today's announcement: R2 Object Storage. The team will review how to join the early access program, and everything else you need to get started.

Read the blog post:

Announcing Cloudflare R2 Storage: Rapid and Reliable Object Storage, minus the egress fees

Find all of our Birthday Week announcements and CFTV segments at the Birthday Week hub

English

Birthday Week

Transcript (Beta)

Hi, everyone. Welcome to the second day of birthday week. I am Jen Vaccaro. I'm a product marketing manager for Cloudflare workers and some of our new announcements today. And I'm here with two people on the product marketing side, and I'll let them introduce themselves. Rita, do you want to get started? Hi, everyone. My name is Rita. I'm a director of product here at Cloudflare, working on developer experience for the workers platform. And everyone, I'm Greg. I'm a product manager who also works on workers focused on distributed storage. So workers can be durable objects. And what we announced today, which is Cloudflare R2, which is our distributed object store. That's great. So yeah, today we want to focus on all things R2 and talk about what it is, how our customers can sign up and what they can expect from R2. So first question that I wanted to ask, and I know some people were commenting about it, is our name R2. So maybe Greg, you can kick us off and just give us a high level. Why did we choose that name and what value or meaning does it symbolize about the product? Yeah, sure. So when we first started building this product and thinking about what we really wanted to do better than other object stores that were out there today, we really focused on egress bandwidth costs. Because we had heard from our customers the pain that they feel when they have to go out of an object store like S3 and take that data and transfer it over to Cloudflare. It ends up locking them into one provider, even though there's this common S3 interface across all providers that should drive down costs. So that was really our target, was to reduce egress fees. So as we were chatting about naming, we thought about S3, T4 was thrown around as like, what if it's the next S3? We threw around workers blobs or a few other ideas, but we settled on R2 because we were like, well, it's S3 minus the things you don't want, like egress bandwidth charges. And so there's also the R2D2 connotation, which is kind of fun. But yeah, that's really where the product came from, was what is there about traditional object stores that just doesn't work for developers and how can we change those things? That's great. Yeah, I know we've got a lot of comments on that with the next version of object storage here. So Greg, maybe you and Rita can kind of tag team on giving us just starting with a high level on what object storage or what our R2 object storage is, kind of some things that it entails, and then we can do even more of a deep dive on some of the specifics. Yeah, so I think it's helpful to understand what object storage is generally. And so object storage is for storing unstructured data. So that can be binary files, it could be large video files, it could be MP3 files, sort of anything you want to store. And that was really the great promise of S3 when it launched back in 2006, was this idea that you could take any set of files you had stored on hardware in your on-premise database and data center, and upload that to the cloud and store it there instead more cheaply than you could actually on premise. The premise is sort of weakened, right? S3 hasn't had a cost reduction in six years at this point. And there's sort of a number of use cases that are coming out for object storage. There's sort of like the backup use case where you just want to have your data in another data center really, that's not your own. You're really infrequently going to access that data and it's just going to kind of sit forever. And there actually are a large number of providers who are part of the Bandwidth Alliance today who sort of sit in that tier of the market where they can store your data really cheaply, more cheaply than AWS. And accessing that data at high request rates really isn't something you care about at that price point. And then there's sort of the high end of the market, like where Amazon and Google sit, where you have support for really high request rates and the ability to drive a ton of traffic to your objects and to access out of your system. It tends to be really expensive though. That's where you see no cost reductions in six years, really high charges for egress bandwidth fees that we've been talking about, but support for those higher request rates. And so we sort of saw this and we were like, well, there's sort of this space in the middle, right? Can we do what Amazon and what GCP are doing with a really reliable, really performant object store that can sort of at the high end scale up to whatever people need to do? And then at the lower end, can we get closer to the low costs of those other providers and sort of sit in that middle? And so that's where we've announced that we're going to zero rate operations below a specific threshold. So we were saying, if you're really infrequently accessing your objects, so say you have an object that you're just going to request it once per second or something like that, that usage is actually basically free for the providers to support if you think about it, because they have these customers that burst and need to do millions of requests a second. They have all this capacity built in. Your one request a second really doesn't matter to them at all, but you end up paying the highest per operation rate when you consider how enterprises get to negotiate down their costs and you don't. And so we said there's really a space here to serve sort of that mid-tier developer who on the low end doesn't have very many requests at all and shouldn't pay for them, but then wants to be able to scale up on the high end and be able to actually have a performant object store. And so that's really where this product sits. And I think there's been a lot of excitement around it because it's something that's not well served today. That's great. Yeah, go ahead Rita. I think a really big part of this too, looking at it from a different angle, is we announced Cloudflare Workers I think four years ago on the dot. It was also a birthday week announcement. And what we've learned from developers over the past four years is more and more developers are trying to move their entire applications to workers. And then you start to unpack like, well, what do you need in order to build an application? And basically every application can basically, one way to think about it is like there's a part that runs logic, right? Like that's the compute. And then there's another part of it that's like, you need to store data somewhere. And so one area in which you'll continue to see us making additional announcements is around the data part of it. But the ability to store especially large objects is something that we've heard many, many customers ask us about. And there's a really unique offering here too in terms of being able to run that compute layer just in front of the object storage, right? Other providers like AWS, for example, you have to basically, there are things that are possible, but you have to do a bunch of maneuvering in order to set it up. This is something that with workers works really, really seamlessly out of the box. And yeah, when we think about AWS, the things that they started with are S3 and EC2, right? Like storage and compute and then queues came down the road, which may or may not be a hint about something that we'll probably announce at some point. But yeah, it seemed like a very natural. Yeah. And I think I want to build on that a little bit to what Rita said about the deep integration with workers here. We've gotten a lot of questions today of like, well, does R2 support pre-signed URLs where you're able to give a non-authenticated user access to a bucket? Do you support TTLs? And the truth is like, yes, we support many of those features, but the tight integration with workers also means that you have full customizability. So where we've started is we've said, we're going to allow you to bind R2 into a worker that really addresses one of the biggest pain points with S3, which is how you do authorization and authentication. And we're using the binding system that we have from workers to address that point today. And then we'll expose a full S3 API on top of that, but developers can build their own S3 APIs on top of this as well. So they can go out and build whatever they need into the platform itself. And that's really the power of combining the storage primitive with workers. We got a question today about like lifecycle rules. And it was like, hey, can you trigger on a request to a bucket? It's like, yeah, just put a worker in front of it. And you can see every time a request has been to that bucket and trigger an action. Down the road, we might provide support where we'll actually go in and trigger a worker for you when the bucket itself updates, but you don't even need that today. That's almost a nice to have on top of that because of the power of the workers platform running right next to R2. So it's a really great announcement, both from kind of what we talked about at first, like a cost perspective and a pain of using S3, but also just the usability of this product from day one is going to be really high. Yeah, that sounds great. Thank you for that context. We did get a lot of questions around how we compare to Amazon S3. And I know Greg, you talked a bit about this in your blog. So I'm wondering if you can talk through maybe some of those pieces of comparison between S3 and then what that migration path might be like for those who are using S3 or S3 compatible products today. Yeah. So we support the full S3 API today. So you're able to migrate those over from those applications just by changing your endpoint. There are some caveats around the headers and things like that, but we'll have those documented. The bigger piece here is really around the ability to sort of do automatic migrations. So it's always been possible to migrate out of S3, but it's sort of an error prone and dangerous process. What we're announcing as well alongside R2 is the ability for R2 to migrate records automatically for you. So you give us an S3 bucket configuration, and the first time an object is requested from that bucket, we'll go to your S3, egress at one, S3 bucket, egress at once, and actually store it in R2. And then future requests for that same object will be served out of R2. So you only pay that egress bandwidth charge once, and then you get the benefit of R2 zero egress from then on. So that's super exciting because I think it's going to take this sort of error prone fraud process for people and really simplify it down and make it so that it's as simple as putting a bucket configuration into Cloudflare to start saving money. I think what's really cool about that is that it kind of naturally prioritizes your highest traffic objects, right? So yeah, you kind of know that the things that are requested most frequently will definitely be there. And you can kind of worry about, you know, if you ever want to fully cut over, just migrating over the rest, but you're not wasting your bandwidth on just the migration. You're also serving customers at the same time. Right. Yeah. So just deep diving a little bit more now into R2. And one thing I want to kind of preface this with or start talking about a little bit more is some example use cases. Greg, I know you mentioned a few that people were asking about. Rita, I know that, you know, you've obviously been in the workers platform for a long time and like have gotten a lot of questions about what people can and can't build with workers. So I'm wondering if you can both talk a little bit more on those use cases and maybe what this opens up just for workers users in general. Yeah, sure. We'll be happy to chat through that. So I think going back to sort of the power of workers, like where you can build new things that were never really possible before or would have been complex and required chaining multiple cloud provider services together. Really excited about like pipelines and ETL pipelines and being able to transform data as it's written to a bucket. Right. So the ability to sort of have a worker that runs on an endpoint, you upload some data and it reformats that data and then stores it in R2 and then maybe egresses it out of R2 to a database as needed. What sort of flips the whole model on its head is because you have zero egress from R2, you can store your data in R2 and then stream it into a cloud provider as needed. Right. So, you know, a lot of people today they'll have, you know, a database that sits inside of some infrastructure and they'll pipe all of their data into object storage on that one provider. They might want to take advantage of some other features on some other cloud provider, like say, you know, you want to use TensorFlow on Google or something like that. You would then have to egress all that data out. So you end up not going to actually access the underlying cloud providers feature. If you have your data in R2, the egress, the ingress into that cloud provider is free. So you can sort of stream the data in as long as the data is a reasonable size and get to access sort of the best of all the different cloud providers. So it's really opening up this promise of multi-cloud, I think. And that specifically has great applications to ETL and data transformation and those sorts of things. I think we're going to see a lot of customers start to store log data here. It's something we're going to start to do ourselves internally, letting you, you know, channel workers logs out to R2. I think there's sort of a very large number of use cases, but those are the ones that come to mind right now. I think another way in which I think about it, one thing that we talk about a lot of Cloudflare is dogfooding, something that's really important to us, feeling the pain of our customers and building things that are actually meaningful, right? Because we have a very tangible problem to solve. And there have been so many products over the past couple of years that we've needed object storage for, that it's really exciting to see more and more of them actually be built on R2. So Cloudflare images that was announced during speed week is one really exciting example of a use case that is going to be built out on R2. And then even thinking back to, we announced pages last year, we ended up using KV for storing the assets because that was the closest thing to it at the time. But if we were, if we were to do it all over again today, and probably at some future, we will start storing files in R2 instead. And I think what's really interesting about the workers platform is that we actually encourage customers to build platforms on top of it themselves, right? So you can totally see the next Cloudflare pages or Netlify or some type provider come along and build their own platform on a combination of workers and R2. Interesting point, Rita, as well, talking about workers KV and how we've been able to use it as sort of maybe a substitute in some instances. So maybe between the both of you can speak for a second on just the rest of our platform, Greg, I know we have durable objects that was just released, obviously workers KV, and then now R2 and our distributed data. So maybe you can just talk about for a second of like how people can think about each pieces of those platforms and maybe when might be a better reason to use one or the other or how they'll be used in conjunction with one another. Yeah, sure. So I think KV used to be used for a lot of use cases that really are object storage use cases, like serving larger assets. KV is a bit more expensive than R2. It's designed for sort of a different use case for really heavy, really frequent reads and writes. And sort of was built at a time when we didn't have other storage options on the platform. I think KV still has sort of a niche for small values that are accessed from within a worker, but this will be much easier for our customers to use and handle much larger volumes of data. So a bit of a difference there. Durable objects are a database primitive. So we actually use durable objects in our implementation of R2, but sort of separate, much smaller key value sizes in that product. And I think I want to touch on quickly like sort of what we're seeing even internally with these products, having durable objects to build on top of let us build R2 incredibly quickly. And now that we have R2, we're going to be able to build a large number of features that we've wanted to build for a while, like Rita was mentioning, incredibly quickly. And I think we're seeing the power even internally of having a set of primitives that are designed to work together, right? So if you go look at how other cloud providers design services, they're all sort of independent. They don't even necessarily follow like consistent models when it comes to R2. And the difference on Cloudflare is that we've really designed each one of these products with the developer experience at the core and looking for our cohesive developer experience across them, right? And so that means that workers just works by default with R2, right? It means that durable objects work with R2 by default. It means Cloudflare Cache works with R2 by default. And all of those are sort of major feature requests that we get that people go, there's no way that Cloudflare Cache can just automatically work with R2. And it's like, yes, it's all part of the integrated platform. And I think that's where you're seeing Matthew talking earlier today about being a fourth cloud provider and our approach there. It's really around figuring out what are these actual pain points for developers and how do we solve them in a way that no one else really can by integrating our offerings. And so I think that's really exciting. That is very exciting. And one thing, just thinking of durable objects, we've started talking about jurisdictional restrictions with durable objects. Will there be any sort of that capability with R2? Yeah. So similarly to how we chatted about the fourth cloud provider and being able to provide developers the easiest, the best experience, I guess, from the ground up, we really believe that all of our primitives should be global by default and then restricted down where necessary. And so that's what we've done with durable objects. We've said durable objects handle regional distribution for you. And then you can restrict back down if you have specific restrictions for given objects. Similarly with R2, we're letting you restrict in a per object basis. We're letting you say this object needs to live in the following region and having that level of granularity. So you can actually go implement an application that handles data from multiple different regions. And so we don't want to get to the level of granularity where we're giving you actual regional controls because we've heard from developers that that's actually a huge burden. You have to manage all these different regions across different cloud providers. And you have to manage which data lives where. We'd rather it just be something your application pass in to say, hey, this actually needs to go here because I know in the application context that this user is from this area. So we will support those eventually. Yeah. That's great. I know we've gotten a lot of questions and interest with durable objects around that. And I'm sure we'll start seeing the same here with R2. So one thing I want to talk about, Greg, you've already touched on it around the egress charges. You mentioned the Cloudflare's Bandwidth Alliance. But can you talk a little bit more about that? What is Cloudflare's Bandwidth Alliance and how this fits into the picture? Yeah. So we have some great partners in the Bandwidth Alliance. Basically, we've committed to zero rating egress traffic to Cloudflare. And these are great companies. Backblaze is in there, DigitalOcean, some other providers. And then we have other providers who give half rate egress. And this is a great benefit to our customers who are able to access our shared network and reduce their costs. I will say R2 isn't designed to compete with the Backblazes or the DigitalOceans of the world. We're really looking at that middle sort of niche to carve out for ourselves between S3 and the high-end sort of crazy request rate providers on the top end, and then the backup solutions and the long -term storage solutions on the other end. And so that's sort of where we sit. And the Bandwidth Alliance is going continue to be critical, I think, for customer choice and for the ability for consumers to basically be able to say, developers to be able to say, hey, I want my data to live in this provider and for that to be an option for them. Part of zero egress from R2 means if we're not doing our job, you can take all your data out and leave. We're not going to lock you in. And yeah, I think that's really exciting for the one product on the Internet that has an interoperable API. Pretty much everything else does not have an interoperable API. You could get me on containers maybe in Kubernetes, but much harder to move data around despite the fact that S3 is technically supported by a large number of providers. And I think this actually goes a long way towards making that sort of dream a reality. Yeah, I know a lot of people will be interested in that and curious about how that continues with R2 with the whole Bandwidth Alliance. The other piece, Greg, that you talked to in your blog and Rita, I'm sure you have some thoughts to share on as well. And you mentioned this, but just to be clear, so there's going to be cheaper price hopefully with R2 kind of compared to those in the market. But let's just clarify, does this come with any reduced scalability or anything like that that our users should be aware of? I think the short answer is no. The longer answer is we're constantly optimizing for different use cases and we're going to work very closely with the customers we have like during the beta period to figure out what they need and where they need storage to scale to. One thing that we've heard a bunch even just today is that people care a lot about latency out of object storage. That wasn't really something that object storage was designed with in mind. And we've been thinking about that internally as well, sort of like this trade-off between throughput and latency. We have some options here and we have some different knobs we can tweak and tune depending on what customers actually need access to. But I think we're building on top of a great architecture with durable objects. And I think we'll have a really strong offering here. So I don't think we have any concerns that we know of around scalability. Greg, can you elaborate actually on why building on durable objects would help with latency? Yeah, sure. So what you need when you're building... So to start, R2 is globally distributed by default. We don't have regional selection to start. We had designed the system as if data was being stored across many regions. And so what you need to do to do that is you need a record of where each object is actually stored. So you need like a global metadata store, essentially like a global database. And so where durable objects really helps is because we can shard that database down and say that there's a durable object representing a given bucket or a given object. We can put that close to where the end user actually is. So the lookup to go figure out where that object might live doesn't add any additional latency. And having that metadata layer already built for us, not having to go build it ourselves, makes object storage actually much, much easier to go and implement because a lot of the effort goes into building a performant metadata layer. Yeah. I think that's so interesting, right? Because there are kind of two overarching use cases, I would say, for object storage, right? One is serving assets to live eyeballs effectively, right? I go to an e-commerce website and image loads and it loads from somewhere. And so ideally, you kind of want that image to be stored as close as possible to me. And then you have this very different use case of, you know, you talked about ETL jobs and transformations and all of that kind of stuff a lot, or maybe pulling that data into a database or into a logging service, right? And so in those instances, you want it to be as close as possible to the source that's pulling the data into somewhere else, right? It might be another cloud provider. And yeah, I think what's really powerful about this is the user experience that you get from not having to really think about that, right? Cloudflare will just kind of move, locate it wherever it makes sense, depending on the access patterns. I think that's right. I think there is something with object storage because the buckets tend to be so large, right? Like you get a petabytes of data here. And if we guess wrong, that could be really costly for you just from a time perspective, not from a bandwidth perspective, because bandwidth is free. I'm going to plug that again. But I think really what that will come down to is maybe some sort of region hinting system where, you know, regions aren't required by default, but you're able to hint us that this is, you know, where this thing should live. We're still thinking about that with durable objects as well. We don't have our fully fleshed strategy here, but it'll probably be similar across the two products. But yeah, I think for the default case, making it global, making it intelligent is certainly the direction we want to head. Great. So just off of the scalability, the low latency that you both touched on, can you talk a little bit more about the reliability that we're going to be seeing with R2? Yeah. So part of what's great about actually replicating data across regions. So when you look at other providers, they generally replicate data within a region for you. We're going to be replicating across regions. And by doing that, we sort of get significantly improved reliability out of the box, right? Whereas, you know, if you're in AWS and you use standard S3, you're replicated across availability zones. But if the region itself goes down, you go down, whereas we're cross-regional by default. So we feel really good about the reliability and the availability we're going to be able to show. We haven't published an SLA or anything like that today. And then under the hood, we're using erasure coding. So on the durability front, offering 11 nines of durability, just like other major providers. All right, Jen, I don't think we can hear you. Okay. Thanks, Greg. Yeah, that makes a lot of sense. We only have a few more minutes, and I know people are dying to know, how can they access it? How long are they going to have to wait? What are we seeing as the next steps in the timeline with all of this? Yeah. So I think right now, we're in a phase of sort of working with some early initial customers and gathering feedback and continuing to build out. Rita mentioned the internal use cases we've already launched, but I think that's going to be where we'll be for a little bit. And then we're going to open it up to an open beta where just anyone can join, anyone could sign up and start using the product. At that phase, we're probably only going to be accessible through a worker where you'll be able to bind a bucket to a specific worker and then make requests from within that worker. And then based on our learnings there, we'll expose a full S3 API that's, you know, you can hit via an endpoint and doesn't need to be called from within a worker, but can be called from within a worker. So that's sort of the long-term path for us to get there. There's a lot of other features that are sort of implicit in there. The migrator piece I talked about earlier, we'll be able to automatically migrate you out from an existing provider that will have support for S3 compatible object storage to start, but we'll come closer towards the, you know, GA timeline rather than in the open beta period. We have some other products internally that we're looking to build on this that we're super excited about that fit in a similar space. And those will be coming in the next few months as well. So there's a lot going on. Yeah. So if people, and if people haven't read the blog yet, they're at the very end, there's the signup page where you can go and request more information. So Greg, if people do go through that process, just the first step in all of this, do we have any sort of idea on like how long it might take for them to hear anything back about additional information? Are we talking like, it sounds like months, maybe weeks? Yeah, sure. It sort of depends on the process, right? So we're working with those few internal customers to kind of figure things out and figure out what we can actually launch at, how many people we can let in. We'll be letting some people in soon. Some people will take a bit longer. Again, it sort of depends on the use case, the amount of data you want to store and whether, you know, it's a good fit for us to kind of partner early and figure out, you know, the future of the product there. That's really our goal, right? Is, you know, to figure, take as much time as we need to build something that delivers on promises we've made and make sure it's polished and ready for use before we go to that open beta period and anyone can kind of get access. Open beta period will be more about, you know, performance testing and, you know, figuring out the reliability of the system and stuff like that. But while we're still kind of in this phase of finalizing the API for certain things, we want to work with a small number of customers. That makes sense. Do we have a sense of what might entail like a good use case or, you know, something that we would want to be maybe focusing on early, or do we just still have to wait and see what comes? Yeah, it's really case by case. I mean, I don't want to turn anyone away at this point. I mean, we're looking for kind of things that run the gamut, right? So, we're looking for people who want to store large amounts of data with infrequent access and kind of are looking for that free zero rating piece for infrequent operations. We're looking for people who need heavy performance, right? We want to test the limits of the system and we, you know, are looking actively for use cases that scale to, you know, hundreds of thousands of requests a second. So, I think it's sort of across the board and everyone will eventually get in. So, that's the one thing I can promise. Yeah. Great. We did just have a question come in. So, the question says, hello, I was hoping you could answer the question of rate limits for R2. Are there any rate limits that developers should be aware of right now? Yeah. So, it's a good question. I'm assuming you're talking specifically about rate limits for doing operations versus like limits around, you know, the real thing that happens is at some level of requests, right? We aren't sure the exact number. We migrate your object. We migrate where we're serving your object from. And so, at that threshold, right, you might see a blip in performance until we level things back out. But the system is designed to scale anywhere from, you know, zero requests a second all the way up to hundreds of thousands. So, we can handle both of those. It's sort of like S3's automatic tiering, if you're familiar with that. But we do tier the objects between different performance levels of storage. So, there will be that sort of shift over in the middle, but we're hoping to make that as seamless as possible. So, rate limit wise, you know, I think anything up to hundreds of thousands for us a second is fine. That's great. All right. We only have a few seconds left. So, thank you everyone for joining. Stay tuned. We have a lot more announcements coming up on birthday week.