💻 Developer Week: What's New with R2
Presented by: Dawn Parzych, Phillip Jones, Abhi Das
Originally aired on October 2, 2023 @ 12:30 PM - 1:00 PM EDT
Welcome to Cloudflare Developer Week 2023!
Cloudflare Developer Week May 15-19, our week-long series of new product announcements and events dedicated to enhancing the developer experience to fuel productivity!
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog post:
- Cloudflare R2 and MosaicML enable training LLMs on any compute, anywhere in the world, with zero switching costs
- The S3 to R2 Super Slurper is now Generally Available
- Use Snowflake with R2 to extend your global data lake
Visit the Developer Week Hub for every announcement and CFTV episode — check back all week for more!
English
Developer Week
Transcript (Beta)
Hi, I'm Dawn Parzych. I am the director of product marketing for the developer platform here at Cloudflare and we're here today to talk about what's new with R2.
I'm joined by two of my colleagues, Phillip and Abhi, and I'm going to hand over to them to introduce themselves.
Abhi, you want to kick things off? Yeah. Hi, everyone.
Thank you, Dawn. I work in the strategic partnerships team under special projects at Cloudflare.
I've been here two years, two and a half years and help support developer week partnerships.
Looking forward to the session. Great. And Phillip?
And yeah, I'm Phillip. I'm a product manager here at Cloudflare focusing on the R2 product.
Excellent. R2 is one of our newer products within the developer platform.
We went GA with that back in September of last year. So eight months, I think it's been.
Phillip, for those people that don't know, can you give us a quick overview of what R2 is?
Sure.
So if you're not familiar with R2, R2 is Cloudflare's distributed object storage platform.
We'll be talking about a few of the new things that we recently shipped today, but that's a high level what it is and some of the benefits and things that we focus on is the fact that we don't charge any egress fees.
So yeah. Saving money is good.
I'm a big fan of that. And I know we have a lot of announcements out today, but Phillip, you snuck an announcement in last week as well about R2 on object life cycles.
What do object life cycles do for R2 users? Sure. So yeah, last week we announced general availability of object life cycle management for R2.
So talking about sort of reducing costs, saving money, obviously it's great not to pay these egress or data transfer fees, but a lot of times in reality, objects that you store may not be intended to live forever.
So you may have temporary log files or other sort of ephemeral data that may not need to be around for more than 30 days or 90 days, et cetera.
And what object life cycle management and the functionality that it offers allows you to do is you can define policies that say, Hey, in a given prefix of my R2 bucket, these objects should expire or be deleted after 30, 60, 90 days, any time that you specify.
And then another aspect of that is when considering multi-part uploads. So if you have unfinished multi-part uploads, if you're not careful, those can be sort of hanging around and taking up storage space and object life cycle management also affords the ability to be able to expire these after a period of your choosing as well.
So it's all about getting rid of data that doesn't necessarily need to be there and saving on costs.
I could use that. I always go through my emails or docs and I have a bunch of like things in draft or untitled documents that I started and never finished.
So I need something to go through and clean all those up for me.
Okay. Abi, I want to ask you, we made a couple of announcements today about some partnerships we've created along with R2.
I want to kick things off talking about our Snowflake partnership.
What can you share with the audience about who Snowflake is and what this partnership is about?
Perfect. Yeah. Thank you, Dawn. So basically the Snowflake partnership is with Snowflake joint customer can now use Snowflake to query data stored inside their R2 data lake to load data and back into Snowflake.
What does it mean? So if you take a step back, often talk about data have been the lifeblood of every organization increasingly.
Some issue is when you have a large organization and there are different vendors within one organization, it's very difficult for different departments to share data and collaborate within the data.
And first problem is you have different vendors within an ecosystem and every company can use their own vendor to store data and that creates vendor lock-in.
Now you might have like vendor two coming in with a better quality product.
So that creates a data segmentation in a way, and it can't really sort of share data easily within an industry or even within a company.
So that creates a data segregation separation in a way.
The problem too is you have different types of data coming in. Initially it was like databases and storage with SQL kind of being a connector with JDBC and ODBC.
But these days you have like audio file, video file images and various different kinds of unstructured and semi-structured data where Python has really become sort of the standard as a result of this change.
So what we really needed is one to kind of access all the data store and pull in data in one place.
And second is do that for all data types very easily.
And this is where Snowflake comes in. So Snowflake is basically solve that data sharing problem where customers using specific vendor solution can push data to Snowflake for analytics or any such purposes.
That's what they solve very well. However, one challenge for sort of loading data to Snowflake, database tables are like querying an external data lake.
Querying that external data lake is cost of data transfer.
That's sort of kind of exponential in a way as you kind of load and load data constantly.
And what happens with large clouds, if you have data and compute in same cloud and same region, that's good.
But in this case, your customer can have different vendors in different regions, data sitting in different regions that creates egress loading back and forth in the data lake.
So pairing R2 with Snowflake lets you focus on getting valuable insights from your data without having to worry about egress keys piling up.
And R2 and Philip can talk in much greater detail on this. R2 is sort of the ideal object storage platform to build on data lakes.
It's infinitely scalable, extremely durable, and has no egress keys.
So that's kind of the essence of this partnership where Snowflake and R2 customers can jointly use this integration to use data lake.
What good is your data if you're paying excessive fees to get it every time you need to analyze it?
The purpose of storing data is to be able to analyze it.
I think Philip, Avi made a good point. Where do you see this Snowflake partnership being able to help our customers and build out their data lakes and analyze their data?
Yeah, I think Avi sort of nailed it exactly.
So it's really about helping folks who have their data lakes backed with R2 storage get the most value out of their data and be able to choose the tools and platforms that they feel are best in order to get that value.
So I think Avi did a great job of that.
We also have a great blog post that went out this morning talking about exactly how to add R2 as an external stage, showing even with some examples how you can use Snowflake to query your new R2 external stage and load data into Snowflake and even load data out of Snowflake as well.
So I think those are kind of the key functionalities.
It really helps folks because they don't have to worry about, hey, is this data that I'm querying or loading into Snowflake coming from a different cloud, or is it even coming from a different region within a cloud?
It just gives folks sort of more freedom to get value out of their data in their R2 data lakes.
So I think Avi summed it up pretty well. Great. Yesterday, if you're following along with our Developer Week announcements, we made a number of announcements around AI.
AI is kind of like the hot new thing, so of course we needed to talk about the things we're doing with AI.
And not just yesterday, but we followed up today with some additional announcements and partnerships in the AI space, the first one being Mosaic.
Avi, you want to share with us some information about our Mosaic partnership?
Yeah. So on the AI case, we are partnering with three total companies, Mosaic, Lambda Labs, and CoreWeave.
But the kind of background to that is building on LLMs and large language model, that power generative AI requires a massive infrastructure.
So most obvious component is compute, which is thousands of GPU.
But equally critical and often overlooked component is data storage infrastructure.
And this is such a beginning of the whole era that we're seeing.
But training data can be terabytes and petabytes in size that needs to be read in parallel across thousands of processes.
And in addition, you have to have model checkpoints that need to be saved frequently throughout the training run.
To manage all the storage costs and scalability, many ML teams have been moving object storage from providers to providers to host their data sets and checkpoints.
But most objects, storage providers uses this, like mainly the large clouds uses this technical lock-in.
So for example, without naming any names, like all these large cloud providers will be having a compute and a storage element, but egress within them is free.
But if you want to take the data out or use GPU from some other cloud provider, the data egress would be $0.08 or over in a way.
So that's where R2 comes in, where R2 zero egress pricing product provides the ML team to have storage in R2, but can move their compute need from provider to provider based on pricing, based on demand availability, because it's such a new space.
Every now and then you have some new provider coming up and with a better product or cheaper options and things like that.
It just gives the ML team, who is kind of very cost conscious at this point, considering how expensive GPU runs are, give them the flexibility just to have data in R2 and move compute from provider to provider as they see the fit.
Okay, you know, Philip, you and I talk quite a bit and we talk a lot about like the AI partnerships and the AI companies building on us.
What is the thing that excites you the most about AI and R2?
Well, yeah, there's a lot there.
A few of the interesting things that are interesting to me are that make me excited.
So, especially with a lot of these new and exciting generative AI companies, one aspect of it is just folks who are storing the things that they're generating, these images, videos, multimedia on R2 and being able to distribute that quickly, performantly around the globe to their users.
So, kind of that aspect of fostering creativity and using R2 to store that data, that's something exciting to me.
But then the second thing that's exciting to me, and this is something that Avi highlighted in the Mosaic ML partnership that we have really speaks to this, is that folks, I guess it's sort of the multi-cloud aspect of AI.
I think what we're seeing with AI is really pushing multi-cloud architecture to the forefront because GPUs are scarce.
There's a wide range of price and sort of quality out there.
And folks who are maybe tied into a given cloud ecosystem want the freedom and want the benefits to go explore.
So, I think that's one of the things that's exciting for me is folks who are able to use R2.
And many of our customers say that R2 is the glue of that multi-cloud ecosystem because maybe they have their compute in cloud provider A and maybe they have GPUs in provider B.
And using R2, they're able to get the data that they need for training their generative AI models, and they're able to load it into GPUs wherever they want.
And that's really huge. This allows folks to get the best prices.
But not only that, price doesn't matter if you can't get the GPUs.
So, this also enables folks to have better availability. And I guess the last point I would just say that on that is, it's not even necessarily, even if it's not multi-cloud, even if your GPUs are in region A or region B, R2 can be that glue so that you can get that data even across region without egress keys within a cloud provider.
So, I think those are a couple of things that are really exciting to me. And I know today we made some announcements and we focused on telling the stories, not only of the partners, but also some really amazing customers who are building really fantastic things.
Yeah. I don't know if this is one of the stories or something that we've just had a conversation about, but I think one of the quotes that's really stuck out to me recently is one of our customers just said they wouldn't be able to be in business if it wasn't for R2, that this is what allows them to build their applications and run their businesses.
And that just sounds really fun and exciting to me.
So, Abhi, you talked a lot about the partnerships we have right now.
Where do you see what's coming next? And if somebody wants to be partners with us and R2, what do they have to do to become a Cloudflare partner?
Great.
Thank you, Dawn. So, I just wanted to add one anecdote before that, which is back in 2015, to Phillip's and your point about R2, is I saw a banner where it said networks, 90% of the data is going to get created in the next 10 years.
When I saw that ad, I thought it's probably an advertisement by the data providers to sell more data.
But actually, it's like looking back 10 years, eight years almost, it's actually true.
I mean, I don't know the exact number. So, if that's the case, then I'm kind of looking forward for the next 10 years, what's happening in the AI space and the data that's being used to train this model, it's going to go exponential in the way I see it.
So, I think egress is a core component in that equation and kind of creating that friction as less as possible for ML teams to enhance their, lower their cost and enhance the experience in a way is the best thing you can do for the community.
Going forward to answer your question, I think there's a lot coming up from the partnership standpoint.
There's a lot of exciting inbounds.
If any company wants to learn more, we have a tech partner page where you can go in and submit a form and kind of explain about what's company use case, what are the partnership use case that you're thinking, and that comes to basically our partner team and kind of assess it and then reach out for potential opportunities.
We have over 200 partners at Cloudflare today, tech partners, very different kinds of partners, but we're always excited to learn more about how we can benefit joint customer by creating integrations that enhances customer experience, lower their cost, enhances the efficiency of our own products or our partner products.
So, always sort of looking forward to learn more what's out there and how we can do better.
All right.
So, let me briefly touch on this. Some of our press releases and announcements talked about some of the customers that are built, some AI customers that are building on R2.
Can you share one of those stories with us? Yeah. I mean, I think one of the really, really cool customers that comes to mind as part of that announcement that you're talking about is Character AI.
They're an amazing customer and I think they really exemplify some of the multi-cloud conversations that we've had.
So, they're able to sort of use R2 as that sort of glue between their multi-cloud environments, compute in one place, GPUs in another and so forth.
And R2 is sort of the glue or sort of the kind of connecting tissue, so to speak, between those providers, allows data to move freely, them to train their AI models, etc.
So, I think that's one example, sort of exemplifying some of the things that we've talked about, but a really exciting one and a really exciting company.
So, really happy about that.
And speaking of moving things, one of the announcements that we haven't talked yet about is the migrator.
People, when they're moving to R2, they have storage in other places and they need to move that or migrate that data to R2.
We've had a migrator called the Super Slurper in beta since September, November?
I don't know, time is a blur. It's November. We're going to go in November and we can get the migrator out there.
It went GA this week. What can you tell us about the migrator and customers' experience with that?
Yeah. So, we announced general availability for it just today, earlier this morning.
And like you said, we announced the private beta a few months ago and we transitioned to open beta and got more feedback from folks.
Today, we finally released it generally as GA.
And yeah, I guess some of the, just to kind of talk about some of the progress that we had in beta, during the beta period, we were fortunate enough to work and partner with hundreds of companies out there who are moving their data from S3 to R2.
And we learned a lot and made some big improvements over that. One of the first and foremost improvements is performance, obviously.
So, if you're someone who's moving data between a cloud provider, you want to do that quickly, right?
It sort of takes a long time that kind of slows down the rate at which you can get your stuff in R2 and sort of saving on that egress, right?
So, time is of the essence. And we made some pretty substantial performance improvements over the last few months, partnering with these amazing companies who we were working with.
And then we also made some other improvements with reliability.
And then, I guess another exciting thing is that not only can you move data from S3 to R2, but we were starting to add more sources.
So, more places where folks can move data from. And we started off with allowing folks to move data within R2 itself.
So, if you have a given R2 bucket, you can actually copy the data to another R2 bucket.
And that's just the beginning.
But yeah, I think those are a few exciting announcements. And we share a couple of stories on the blog post, just talking about how fast and easy and seamless the migration experience is.
We've had, I think one of the examples is customer expected a migration to take days and it was done in a matter of a few minutes, right?
So, that's the kind of simplicity and performance that we're looking to enable and just the beginning.
So, happy to kind of talk about our future plans if that's interesting as well.
Of course, I want to hear about the future. I know this segment is the what's new in R2, but I'm going to switch it up.
What's next with R2?
What can you tell us about what's coming? Yeah. So, the migration tool that we have today allows folks to essentially in one motion copy data from a cloud provider bucket to your specified R2 bucket.
And that happens all at once. And there are things you can specify like prefix, like you only want to specify a given prefix to migrate a subset of data.
You can do that. And that's amazing. But what's next is, we're looking to do incremental migration.
So, if you want to start migrating your data to R2, but maybe it doesn't make sense to just copy all of it at once, we want you to be able to do that incrementally and start seeing value today.
So, it's even easier, even quicker time to value. And then the big thing there is cost because egress doesn't just necessarily apply when a customer is requesting data or viewing images or video, but it does apply when migrating as well.
And one of the goals that we want to accomplish with incremental migration is to do migration in a way with as little cost overhead as possible, ideally zero.
And that's one of the exciting things we're working on.
So, quick time to value, really minimal engineering effort.
We think that's something that we as a platform could do better for folks so they don't have to worry about that themselves.
And then the third is no cost overhead.
So, that's something that we're excited to build on. And I would just say, if anyone is interested in participating in the open beta for that and working just as closely as we worked with the folks, with the one -time migrator, I would encourage folks to either reach out to myself or we also have a little survey in the blog post that we released today.
So, we'd love to partner with folks and, you know, that's what's next.
I'm going to make a shameless plug here as well.
In addition to the blog and the survey and all of that, we also have a Discord for developers.
So, if you want access to PMs, they come in there. They have AMA sessions.
We do little live demos. I know that Philip spent some time in the Discord talking about R2 and hearing about what customers are building there.
If you want to join that, you can join that at discord.Cloudflare.com. We've got about five minutes left.
So, in those five minutes, Abhi, anything you want to share, any shameless plug you want to make, this is your last chance.
I will take that last chance.
So, yeah, from the partnership standpoint, I think it's really exciting for us for sort of getting to this snowflake, you know, data lake kind of space where customers can use our product right now and kind of helping customer to kind of lower their cost.
You'll see more and more kind of partnership type opportunities going forward to benefit customers in this use case.
Second use case is the AI ML use case where it's just the beginning. The industry is sort of expanding pretty rapidly.
We have figured out a niche where it makes a lot of sense and it adds value for the customer.
And we are constantly listening to our existing partners and new partnership to kind of expand this use case.
But regardless, the shameless plug here is if you see R2 as a value proposition for you, zero egress, better storage, faster, but you can't use it today for some of the use cases because some integration is missing.
We need to partner with some other team.
Please reach out to Philip through Discord or through the partner page at Cloudflare.com slash partners.
Fill out the form or even fill out the blog form at the end to just kind of give us ideas because we can think through top down, but what kind of use cases that you're going through, if you can give us that feedback and move faster.
That's a great point. We know only so many companies out there.
We need to know who you want us to be partnering with and who's going to improve your developer experience if we build integrations with various companies.
Philip, same question to you. Last chance, shameless plug. Yeah. What do you want to say?
I think for my shameless plug, I want to just focus on some of the customers that we were able to sort of highlight this week that we're really excited for.
I already talked a little bit about Character AI, but I want to say their name again.
We appreciate them and I think they're a really, really great company. Then some of the other ones that we've been working closely with and have shared their stories as well with us, Leonardo .ai, Lexica, PsychGPT as well.
Those are just a handful of folks, but really, my shameless plug is just to them.
We appreciate them and they make me really excited and glad that we were able to sort of highlight and share the stories together.
Yeah. Our customers is really why we're doing all of this.
I love hearing customer stories. Philip and I geek out over these all the time, at least once a week.
We're like, this customer's doing this.
This is really exciting. In the Discord as well, we have a What I Built section.
It's really fun to see the things that people are building. This week with our Developer Week, we're seeing people already sharing on Twitter the things that they're building with the announcements that we're making so far.
Go out there, build things.
If you have questions, join us on the Discord or find ways to find us on Twitter or elsewhere.
We love to hear from you and to see what you're building and to see how we can help you.
Avi, Philip, thank you so much for your time today and sharing information on what's new with R2.
Stay tuned throughout the week for more segments on what launched during our Developer Week.
Thank you all very much.
Perfect. Thank you, Dawn. Thank you, Philip.