Developer Week Day 4: Delivering Data at Global Scale
Presented by: Craig Dennis, Alex Graham, Pranshu Maheshwari
Originally aired on April 10 @ 12:00 PM - 1:30 PM EDT
Welcome to Cloudflare Developer Week 2025!
Cloudflare Developer Week April 7-11, our week-long series of new product announcements and events dedicated to enhancing the developer experience to fuel productivity!
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
- R2 Data Catalog: Managed Apache Iceberg tables with zero egress fees
- Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines
- Sequential consistency without borders: how D1 implements global read replication
- Making Super Slurper 5x faster with Workers, Durable Objects, and Queues
Visit the Developer Week Hub for every announcement and CFTV episode — check back all week for more!
English
Developer
Transcript (Beta)
Hello everybody and welcome to day 4 of Developer Week. This is a live stream and we are doing our data day today.
We're trying to name it. Should this be the data experience or what should we do?
So you'll see later as the email comes out exactly what we ended up calling this.
But here I am. I am so happy to be joined today with Alex Graham and Pranshu.
We are going to talk today about data. I'm sure I don't know if you've seen the blogs yet.
I know they're kind of hard to keep up with. I know over I'm in London right now, so I've had all day to look at these blogs.
But there's there's some people on the West Coast who are just now seeing these blogs.
We've got some really big announcements.
And if you haven't had a chance to read it, we're going to walk through it right here because I have so many questions for you, Alex, about what we launched today and what what did we launch today?
Yeah. So what we did is we launched our two data catalog, which is a like a managed iceberg catalog that you can just start on any of your R2 buckets.
Right. You just any R2 bucket could be an existing one, could be a new one.
You just go to the dash, click activate and boom, you got an iceberg catalog.
OK, I'm going to I'm going to for the for the people out there like myself who don't know what an iceberg catalog, I know what an iceberg is, but I don't know what an iceberg catalog is.
What does that what does that mean?
Yeah. So it's an open table format used for for like a like a big data catalog, you know, certain just managing like a massive, you know, analytical database.
You know, there's, you know, it's a something that you could use, you know, like if you're trying to query like a data, like you're trying to build like a data warehouse on S3 or R2.
Right. That's OK. It's R2. Yeah. Right. You want to store gigabytes, terabytes, petabytes of data and you want to wrangle all that data.
You want to build tables over it. You want to have, you know, a schema around it.
You want to transactionally update those tables. Iceberg allows us to have it.
If you want to query the data, you can hook it up with Spark.
You can hook it up with .DB. Can you hook it up with Databricks, Snowflake? There's a lot of different query engines that you can use because it is an open table format.
All of those query engines know how to query the data. So that gives you the ability to decouple storage.
But it also gives you that accessibility from many different platforms.
So you just choose the platform you want to use and go ahead and use it.
Awesome. And who's going to who's going to benefit most from this? Like who should we be shouting to right now?
Yeah. You know, I think, you know, there's probably like, you know, a simple answer is like data scientists, you know, like data scientists at a startup or even an established enterprise company who are like, wow, you know what, I want to be building a data warehouse and maybe I'm a little bit worried about cost.
Right. OK. Maybe certain cloud providers charge quite a bit, not only for the object store, but for the egress fees.
Right. They charge. Right.
Sort of. We were working on a on a good metaphor for this. Like having. Yeah. It's sort of like you have to pay ATM fees to access your own money.
Right. Right. Right.
Because that's that's what egress fees are. Right. Like you like you stick something someplace.
You're like, I'm going to go get my thing out and I'm going to pay for it.
Why? It's my thing. Exactly. Like you have your even, you know, you're you put it you put you can put this data into an object store.
And then when you are trying to access it outside of that cloud vendor, right, maybe another cloud vendor or even just, you know, from a browser or like, you know, your own laptop, if you're trying to query it, you are paying egress fees to get it out.
Right. In most cases. In those cases, for R2, though, we're not charging. You can stick this in R2, you activate the catalog.
You can run a like a notebook on your own laptop or you can run a spark cluster on Databricks and we are not charging for those egress fees that are coming out.
And that's a pretty big savings, from what I understand from, you know, I talked I talked to people who pay those egress fees and they're like, wait, what you don't charge.
That's pretty big, right?
And if you're storing terabytes, that's quite a bit of data. Awesome. Awesome.
So people are going to be super stoked about this. I think so. And I think they're going to be stoked because it's, you know, if they're already wanting to use R2 for storing for building an iceberg catalog or building an iceberg warehouse, the fact that we have the catalog here simplifies their infrastructure.
We already have customers who are using R2 for iceberg.
And, you know, one of their pain points is the fact that they have to manage it like a catalog externally.
Now, what a catalog is, is the catalog is the is sort of controller.
It is controlling the metadata layer.
So the metadata is what's helping to understand, OK, what are the tables that I have in my iceberg catalog?
What are the namespaces? What are, you know, and this is one of the things that provides transactionality, because, you know, typically what you'll do is you actually have something like Spark or your query execution or a runner or like, you know, like a data pipeline stream, you know, loading data into your warehouse.
Right. But it needs to communicate with the catalog and tell the catalog, hey, I'm updating, like I'm updating the warehouse right now.
I'm adding data. And so when it's so you can write the objects in there and then it needs to say, hey, is it OK for me to commit this?
Right. And that's what provides the transactionality. If it fails to commit, it can try again, you know, to reread the data, reapply the commits and then try it again.
And so that's what helps to allow. Like that's what provides the ability to like make updates without overriding something else.
So if you have multiple teams writing to the same table, they won't step on each other's toes and lose data.
Gotcha. And when you said pipelines there and you said it with a lowercase P, but we're about we're going to talk here in a bit about the uppercase P pipeline.
Is that right, Pranshu? Right. Spoiler alert. That's what we're going to be talking about here later.
But then and that goes to R2, right? So we've got. We've got write and read out of that together.
Working together beautifully. That's exactly right.
It's like you pipeline stream data into R2 and set up an iceberg catalog.
So the querying becomes really easy. Awesome. This is a big step forward for what this is a new capability, right?
This is a new this is new. People have been wanting this.
Yeah, I'd say both of them are new capabilities. I'd say it's it's it's not just like a it's a capability.
It's also a whole new like today we're kind of announcing a big step forward for the dev platform because we're entering this analytics space.
Right. You can already use, you know, Cloudflare's dev platform to build your application.
Right. We have workers. We have D1. We have workflows.
You know, if you we have hyperdrive and we just announced support. Right.
So like if you are trying to build your application, you have your you know, you have your database there.
We provide that. But that's just for your product. Right.
There's a whole other section of building a business, right, which is your analytics, right, which is not just, you know, observability about your product, but observability about the business itself.
The business itself is like a machine.
And this is providing you know, that is one of the dimensions you get from building like an analytics warehouse is observa is you can start to build tables which answer questions about your business.
You know, there's other use cases as well.
Like this is also a place you can dump IoT streaming. Like if you have a company that has IoT devices, right, you can dump all that sensors.
Just sensor data.
Yeah. Sensor data. Right. If you're trying to build like an ML model, like some sort of you're building your own model, not you're not just using the model and consuming someone else's.
You're building models where you need lots of training data.
And so building the training data, building the metadata that goes that backs that is another massive use case.
What what what what people happen to do before we had this new this this new era?
What did you say? Did you said you said more than a capability to new?
Yeah, it's it's it's like a whole new product vertical, right?
A new dimension of this. Right. So we have the ability to build products.
But now we're building the ability to create like analytic workflows, build like we're entering that big data space on top of R2.
We see I think we already had R2 as the object store.
And now this is an opportunity to provide customers more capabilities over the data they're already storing in R2.
Awesome. And what were they using before this?
Like what's the like if I had to line up a parallel, like what's the what's this like to product wise?
Give me some product names that this is similar to.
Yeah. Yeah. I mean, so like, you know, over if you're building like something over in like S3, right, like I mean, you could be using AWS S3 to create like a data lake and you could be using EMR to run queries.
You know, we don't have the query processor right now, but although we very much are aware that that's something that customers may or may not want and, you know, on our platform as well.
So we're, you know, definitely mindful of that. No promises. Right, right.
So it's open beta. We're in an open beta phase. We're going to see what people want.
We're in an open beta phase. And I think what we're recognizing is that we are building out this this platform, the suite of products that allow people to build this.
Right. So over in AWS, you have S3, you have EMR to run that, you have Athena to be able to run queries, you have Glue to be able to, you have Kinesis, I think, is a is a parallel to pipelines.
Okay. Kinesis Firehose, right, where you're just, I mean, I like that imagery, but like you're just firehosing that data in there.
But with pipelines, you're piping it in, right? You're piping that data into R2 as a way to like to start building.
And, you know, so you have that pipeline to move the data in and then you have Iceberg to provide structure and transactional updates around it and the ability to hook in your query engine so that you're eventually able to sort of actually query and make use of the data.
I think what people were doing, you know, what you could do before is just try to use maybe open source tools or, you know, manage your own tooling in order to sort of get it to work.
Right. Like, again, like customers who wanted to use R2 for analytics or use it for Iceberg were having to manage that themselves.
And this is a step forward for us to say, no, no, we're offering a product.
We're going to make this easier for you.
We're going to manage as much of this, you know, more of this for you so that it's just easier for you to build this out.
Awesome. That makes sense.
That lines up with the dev platform. Constantly doing that, right? That's awesome.
Do you have a demo or something we could we could check out, Alex?
Do you mind sharing your screen? Awesome. So I'm going to share my screen. I have a demo that's based off of an existing an existing warehouse.
It's actually I mean, it's pretty it's pretty easy to get set up.
But, you know, you just.
You pretty much just go to the dash, you can go to any R2 catalog and you click the activate button.
So when you click the activate button, you're going to you have two pieces of information and I'm going to I'm going to show you how you can hook that up to create a warehouse.
So this is a notebook, right? Like this is a Marimo notebook that's running PySpark.
So let's just give you guys let's just kind of walk through it a little bit.
So here we're you know, this is something that, you know, if you're running to use something like PySpark, you just you know, you import it into your Python app.
Right. And then then you configure you configure it to run with Iceberg.
So here you can see that we are pulling in the Iceberg Spark runtime and we're going to be setting the Spark config to use Iceberg's configuration.
Now, let me explain what some of these things are, because they're pretty key to how you hook this up with your R2 data catalog.
Remember that I mean, this is running my laptop.
And so I'm telling it, hey, like when you're going to go query, go query my R2 data catalog, you know, up up in Cloudflare.
So the thing you do is you set the type of catalog to a REST catalog.
That's that's really what we're that's kind of the magic behind what we're doing is we've set up a catalog that's a sort of exposes the Apache REST interface.
So what's kind of this is, again, part of that open the open nature of of Iceberg is that it has like an open API specification for its REST API.
Right. So this, you know, what we're implementing and what we're providing is something that other catalogs also provide.
Actually, just to briefly digress from the demo, I want to touch on that point, which is the openness of this of this kind of like double click on the open table format aspect of it.
Because it's open, right?
Other you can use any query engine against it, but it's also, you know, you can use other catalogs.
You know, you're not bound to using our system to run, you know, an Iceberg warehouse, right?
I mean, I've already mentioned that customers already do that.
We're hoping we're hoping, you know, that you want to use us because by providing the best product, I mean, a very integrated product, but you're not locked into anything.
And I think that's nice. That's a key point, right?
Yeah, because you can there's no vendor here, right? If you want to move, if you want to migrate, if you decide, hey, you know, whatever, like I want to go use something else, you are not locked in.
And I think that long term ability to move away if you need to.
Actually increases trust in us, right? We are one of the one of the key principles of Cloudflare is, you know, transparency.
And I think this open table format fits that principle really well.
Yeah, that's awesome.
That's awesome. I mean, like maybe you want to like maybe you missed your egress fees and you're like, I missed that budget line item fees in my ATM.
I hate free withdrawals.
So, yeah, yeah, they want to go back to that. So, no, but that's cool.
I mean, that's great. I think that's awesome. And I love I love that it's called open table.
That's a really cool thing. Yeah, that's really, really cool.
Basically, again, you know, we talked about this decouples compute from storage and the open table format decouples like allows you to move between the two, the different pieces and the different dimensions.
Just go back to the demo.
I'm sorry. All right. I'm just dragging this along. So when you do when you create or activate your your catalog on a bucket, you'll get two pieces of information.
One, you're going to get your warehouse name and you're going to get this URL.
And so this URL is very similar to like, for instance, a URL that you would get when you create your R2 bucket.
And so you take this URL, you stick it in the URI configuration, you take your warehouse and you stick it in the warehouse configuration.
Also, you need a R2 token. I'm not going to display mine, so I have it.
Why not? Environment variable. I don't want people watching the live stream, you know, I don't know, dumping gigabytes of data into my bucket.
So, but you will.
So that's, I guess, the third piece of information is you will you just need to get a token and use the R2 API tokens API to create that.
We also have some documentation.
Does that come from the Dash, Alex? You build a new a new token from the Dash?
Yes. Yeah. And and if you have an old R2 token, let's say you have an admin R2 token that you created a couple of months ago.
Make sure you create a new one.
So we've added new permissions to access the R2 token. The R2 data catalog.
So just go ahead and create a new admin token. And so those those right now have the permissions.
And we also have, if you want to make more specific or like permissions, but like that follow the least privileged principles.
So like maybe a little bit tighter.
So you don't want to give admin access. You only want to give access to a specific bucket.
We have instructions on how you can do that. OK, great.
Awesome. And the docs are out. This went live today. Open beta docs are awesome.
Sweet. Oh, yeah. So we got those up. All right. So the other thing. So then there's some other config that you can I don't actually know if this is necessary, but I forget I've just had this in a while.
I'll but I probably at any rate, I will update.
I'm going to put this notebook on our demo repo. So we have another repo that has examples of how to run Iceberg.
So I have we have one for running like a spark application in Scala.
Like if you wanted to build like a like an actual spark application that's running, you know, on a cluster and you want to hook it up to this.
Well, we have docs on that. I'll add this, you know, sort of instructions on how to even just get set up with this notebook.
And I'll I'll double check on these configuration values.
So anyway, you get that set up and we can just go ahead and run it.
Now, this is going to print a scary red exception. Oh, you know what I need to do?
I just need to turn off warp. That is what I need to do.
I hope that doesn't kill my live stream. Unfortunately, I should have checked this before.
Fortunately, it's just you're just proving it's live. That's that's what.
Yeah. See, if it went smoothly, I would. Yeah, I know. Yeah. The fact that it's looking at all is a surprise.
That is. You haven't you haven't truly tested your your application until you do it in front of, you know, hundreds of people.
OK, so let me just describe what we did right here. So so we set up our configuration.
This is telling PySpark, hey, this is the application I want.
This is my configuration. And now what we did is we just ran some some SQL commands, and that is I want you to create a namespace if it doesn't already exist and then go ahead and show me all the namespaces in my iceberg catalog.
So this is talking to the R2 data catalog.
And this is just in talking to the metadata layer saying, hey, what are the namespaces?
So that's that's what it fetched.
Let's let's now do a little bit more. What we're going to do is we're going to load some really simple data, bits of data into our into a warehouse, just five rows, just to show you that we can do it.
So we're going to go ahead and write.
So this is like uploading into R2. Give it a second.
Now it's done and now we're going to go ahead and let's just query.
Let's just read all the rows we just read, we just had written. Boom, there we go.
Nice. Now we have our data. Hooray. And so we can do, you know, I mean, it's Spark SQL, so we can do more complicated queries.
So let's just do a query right now where we fetch.
We only fetch the rows where the floats column where the for the values are negative.
OK, I didn't run this. We got Hello World. Yeah, it is.
Yeah. Nice bar. Yeah, I know those guys. All right. I mean, that's a super simple like use case, right?
Like we're just I mean, five rows is pathetically simple considering the fact that you're you should be able to upload, you know, peta like terabytes of data.
So the you know, just to make the demo a little bit better, a little bit more interesting.
So and it's still only 56 megabytes.
So I'm going to use the New York City yellow taxi data and we can run a few queries there.
I've already done this. I've already uploaded it. It takes 20, 20 seconds to do.
So I don't want to do it on the live stream because, you know, we got we got to get get rolling.
Yeah, but even 56 megabytes, you could probably I mean, you could easily just download all so that so the New York City taxi data just to take a step back is a free sort of like open data set that you can download from their website and they break it down by year and month.
So you could just download all of this data stored in Parquet, upload it into your iceberg warehouse and just start running queries.
Right. I just have another but I'm going to play dummy here because I am a dummy when it comes to this.
So Parquet's its own format.
It's its own like it's like a CSV sort of thing or multiple tables or JSON if you want.
However, when you're running massive when you're running like queries, analytics queries typically are like aggregations of data or data like of rows.
Right. So it's querying in a different way. Like in your application, you'll tend to query row by row, whereas when you're doing these aggregations, you're going to tend to query along columns.
Parquet is a column or format.
Right. So it's storing the data. It's storing the data like a by column, allowing it to run effective, efficient queries along those columns.
It also can get you there.
It has optimizations as well, like it pre calculates like certain stats like minimax stats.
So it can help with pruning the data and help and it can help with optimizations.
Parquet is definitely one of the preferred formats for storing an iceberg.
There's a few others like Ork and so but like Parquet is is pretty, pretty popular and it's the one that I was using.
OK, awesome.
Thank you. Yeah, definitely. So I've already loaded this data, so I'm not going to do it, but let's go ahead and.
Just as an example, let's let's get the first 10 taxi rides of 2025, so these are the people who, you know, the the the ball has dropped in New York City, they just jumped in a taxi, right?
Yeah, I've got to get home, got to take off the hat, get those glasses off, jump in a cab.
So, I mean, this person, boom, two seconds in the beginning of the year.
Nice. Yeah. So, yeah, kind of by themselves.
A couple. There's a couple. There's a there's a couple that there's a couple, somebody, somebody.
This guy just kissed his gal and got the heck out.
Yeah, yeah. No bubbly. Just straight look straight in the gut. Straight the cat.
Is that right? One of those might be me. I think the point four one miles.
Just that sounds like something I would do. Yeah, point point four. It was too bad, you know.
Get me out of here. Same location. Yeah, I pulled in the location data as well.
So we can see where this. So we can see where that is. No, just just a Manhattan.
So they did a quick trip within Manhattan. Nice. OK, nice. OK, so, you know, just a little bit funny.
Like what is I wanted to see, OK, which taxi ride had the most passengers in it?
Oh, that's a good one. Yeah. Like the uncomfortable, the uncomfortable ride.
Like a nine in a cab. Ninety one. Oh, they have minivans there now, though, right?
And in Manhattan. But I prefer. So that's the logical answer is that people just cramped in.
And I prefer the mental image of nine clowns kind of cramming themselves.
That's the mental image I want to live with.
Not it's going to be a good year. It's going to be a good year for those clowns.
That's their their their resolution. So we're going to use cabs, no more of these small clown cars.
And then one final query.
We can see what are the top ten largest taxi fares in January?
Yeah. And so this one's fun. Yeah. Oh, wow. Two hundred and sixty three thousand dollars.
Again, probably there's a reasonable rational explanation why they recorded almost a million dollars as a taxi fare.
I prefer just to live in a world where someone actually did pay eight hundred and sixty three thousand.
I mean, I think maybe like sometimes from the airport, it's pretty expensive.
This one, you know, I did.
So this was yeah, I'm not going to do the I'm not going to do the sequel live.
Yeah, I have a rule. Yeah, that never works out. Well, on a live stream.
That's why I'm doing it. But yeah, the I looked this up. It is from LaGuardia to Alphabet City, but it was only a five minute taxi ride.
I looked up the distance.
I'm not from New York. I've only been there once, but I looked it up on Google Maps.
It's like a 30 minute. But that's a. I think it's a I think it's a billionaire with a given a really nice tip on New Year's to taxi drivers.
They felt bad for those helicopter taxis.
I don't know. Right, right. Yeah. You didn't filter by by car only.
Yeah. So that's super fast. Like what? That's that's crunching through a bunch of data.
Fifty six. I mean, like I'm saying a bunch of data.
But yeah. Yeah. Yeah. You know, like for fifty six for the for the single file, like this is a kind of a contrived example.
I'll be completely honest. If I were an act like an actual data scientist or, you know, someone who's actually wants to crunch this crunch, this data probably could do so with using like DB is actually a technology that's it's really good for like running queries locally on your laptop against smaller data sources.
So it's more like when you want to crunch, you know, hundreds of gigabytes starting to get to terabytes or when you're trying to just dump lots of data into your into your object store and then query across all of that data.
Yeah. That's where this starts to shine. And it's in our catalog is going to shine a little bit brighter when for that use case.
I want to just briefly preview.
So like we're in beta right now. When we go GA, one of the features that we're planning to provide is a sort of managed table maintenance.
So the reason so that one of the pain points customers have had is that you have to use the catalog like an external catalog.
Now we have our catalog, right? You know, already to leave Cloudflare to have an iceberg catalog.
But the other pain point is table maintenance.
So iceberg tables, as you're writing data to them, can suffer from something called the small file problem or so as you're loading data.
So this is something, you know, with like data pipelines, you might only have like 30 megabytes per file, right?
And you're just uploading hundreds and hundreds and hundreds of files with, you know, 30 megabytes.
So then when you run a query processor, that query processor is now now has to go download hundreds and maybe possibly thousands of files that are only 30 megabytes large, right?
It's not the most effective.
Like there's multiple reasons why, but there's it's not efficient.
Right. Like the network. I owe just the overhead of managing all of those files.
So one of the one of the way one of the ways around that is you can run compaction on your on your your table.
So you iceberg provides like iceberg spark, for instance, provides a compaction function that you can run.
Well, but that requires you to sort of set up a spark cluster and periodically run this compaction process.
That's the money that you're spending. That's that's infrastructure you're managing.
That's, you know, just you've got to and, you know, you have to stay on top of it.
You also have to constantly optimize it, right, and make sure that it's running correctly for your use case.
And so what we're looking to what we're going to be building is a managed compaction system, not just compaction.
Eventually, we'll be working on like removing orphaned files. Like so when you're when you have rights, sometimes those rights fail.
And but they've already written those objects.
Right. And they might not necessarily clean up the campground after they leave.
Right. Like they might just leave those files lying around.
So this is like, you know, kind of like campground janitor going through and cleaning up those files, getting rid of them to keep your basically to remove that metadata, but also reduce your storage costs.
Right. You know, you're not paying for extra objects that you aren't actually that are part of your your warehouse.
So stuff like that, you know, just making sure all of this. What this does at the end of the day is reduce like it keeps your your iceberg catalog optimal for both cost purposes, but also performance.
When you're running queries, you know, queries are that much more performant when you've you've run compaction on them because it doesn't have to query all the data.
There's less metadata it has to read, and it can do more effective pruning of files so that it doesn't even need to read files like if it doesn't have to.
Yeah. So, yeah, that's what we're going to be working on next.
And I think also that's cool. Even more value for customers.
Awesome. We we had a question from the community here. Is there any cost in running a SQL query?
Oh, that's a good question. So we don't. So I do want to, again, highlight two aspects to this.
One, we don't provide the query engine, right?
So that's, you know, I'm running on a notebook right now that I just installed locally on my laptop.
For larger data sets, you might need a compute cluster.
And so that's something you would need to sort of provision. And so there's costs there.
But that's not additional costs that we're charging. The cost that we're so during beta, there is no we're not going to be charging any fees or pricing for for, you know, using the R2 data catalog.
So have fun with it. We do have some pricing that we're so we in our blog post, we mentioned kind of the pricing that we're going to that we're thinking about.
And so the so the pricing that we have is that for the data you're storing in R2, you know, the catalogs, you don't pay any more.
Right. We are not charging any additional. There's no special R2 data catalog pricing for the data.
Just standard R2 pricing. There is we are going to charge for the just, you know, when you make a request to R2, we charge also for those requests.
So similar similarly, we are going to charge for the requests, you know, to the catalog.
But it's, you know, right. Fractions of a penny for each request.
OK, then awesome. And then when we build compaction, we will charge per gigabyte like we'll charge a certain amount.
I think it's I can't remember the exact amount, but it's per gigabyte compacted.
And so, you know, and it's it's comparable to other products that also have managed compaction as well.
OK.
Yeah, because that's a common problem across all the platforms that are using this style.
OK. Awesome. Great. You I have a question that like this, this always comes up when somebody says big data.
I know, Pranshu, you talk about this, too. You'll say you'll say big data.
What what is the separation there? How should I think about that?
Like, when should I go this versus a database myself? Like, how should I think about that separation?
Yeah, I think that's a great question. And it kind of goes to the essential difference between what big data is versus anything else.
OK. Like, you know, if I'm reaching for a database like D1, Postgres or MySQL, which, again, you can use hyperdrive to connect to.
Right. And you should, right?
And you should. It's day to day. It's OK. You can talk about it. Yeah, exactly.
You know, if you're if you're using that. So those are those are great for when you're building your application.
Right. When you're when you're trying to like when you're building the data access patterns for your product, the customers are using right when they are interacting with the product on a day to day level or like kind of that on that level.
That's that's really what you want. That's where you're going to want to use that database.
I say it this way, like because like. One of the distinctions people say is OLTP versus OLAP.
And with that, and I'll define it.
So online transaction processing versus online analytics processing.
The reason I'm trying to avoid that is because Iceberg provides transactions.
Right. If you can do transactions in one and you can do it like you can do analytics and transactions in both.
I don't really think that they're very good terms for describing the difference.
I think it's more of the use case.
The use case of building data for your product is different than building use cases for than than building.
It's different than the use case of building something like analytics.
You know, IOT streaming data storage from ML data processing.
These are things that happen kind of on the back end, right? They're not they're not the product, but they're critical parts of your business that feed into the product.
Gotcha. Gotcha. OK, so so when somebody like we're a data driven org, you might be looking at your data of how users are using specifically the insides of your application.
But you have this whole other set of data that will help drive decisions that you might make.
And this is where this is where he goes for that.
OK, thank you. That has been on on my like on like that's kind of a strange name.
The big the big data. It's kind of like this is a big deal. Not like this is a lot of data.
It's more like. Yeah. Yeah. It's it's doesn't really like it's like big data, but it's like, well, does that really describe the problem?
I feel like it's right.
Right. That, you know, was in fashion back when we were all using Hadoop, like 10 or 15 years ago.
Right. And it's like, oh, big data is here, you know?
Oh, yeah. And, you know, I really like Alex's description of it. Right.
It's like I think when most people talk big data, they're talking. How do we how make analytical decisions off of large streams of data that are, you know, we're going to kind of store them separately from application.
Right. Like this doesn't matter to our users.
Like our users. Right. Don't really care about our data. User clicks on a website or interactions with each other.
Right. Yeah. That's kind of that's internal to our application and our data scientists and our product managers are going to use this data to make better decisions.
You know? Yeah. And you don't you don't want your users paying for that, too.
Right. Like you don't want to fill up their database with that stuff that they're not going to use.
That's a cool.
That's I like that separation. That's really nice. Yeah. Yeah, exactly. Because I think I think back in the day, maybe like the very I imagine like the very first way of doing this was like, so you hire your first data scientist and you ask your data scientist, hey, do you decide to help us figure out what our user retention rate is?
You know? Yeah. Scientists run the query against the production database.
Yeah. A long list of users. Yeah. I don't know. I may or may not be guilty of doing that.
And the query patterns are different. Right. It's like when you're trying to run these.
So the day to day products operations are row based.
Right. Whereas these aggregation like these analytical queries are like their aggregations of data typically like the classic query is just what are like what are the user clicks over the course of this period of time?
Right. Time based queries.
And so like that's columnar based. And so it's completely different. You know, having two separate technologies helps to optimize for that case.
Right. Yeah.
Yep. Yep. Yep. And then back in the day, that early data science, one of the data scientists would show up like, why is the site so slow?
Because the data scientist is trying to get you the numbers for the.
Yeah, exactly. OK, cool. That makes a lot of sense.
Let's I'm going to do one more because I have the other one that always gets me is like, what's this lake life data data lake?
Is that what are we at?
Why a lake? What's what's the deal? What's that? Yeah, that's that's another great question.
I mean, I think to to answer that, we have to go back in time. So we were we back to like, I think the first the the the first moment like bit of time, like early history where you hire a data scientist and they query the production database and it slows everything down.
So then what we did, what folks decided, OK, we need a separate database.
And Alex is right. They have different types of query patterns.
So let's build a complete let's build a database that is optimized for analytics.
So there we build data warehouses. Right. So like I can't think of an open I've got to look up an open source data warehouse.
But like, for instance, AWS Redshift is a is a data warehouse where it's a compute cluster that is able to that, you know, is able to scale to petabytes of data and run these queries.
The problem with like data warehouses, the problem people ran into is it couples compute and storage at the end of the day.
So if you need to scale your storage, but you don't necessarily need to scale your compute, you're still paying for the resources because each node is both a storage node and a compute node.
Right.
Gotcha. Yes, compute. But you don't need to scale storage, you know, so, you know, you're because like the data store, like the compute resources and the storage resources are like scale along different dimensions.
And so what folks, you know, when especially when you had something like you had kind of Apache Spark and you had S3 sort of coming along, people realized, hey, you know what?
I can kind of put these two together.
I can run. I can store my data in S3. I can just dump it all there.
And then I can use something like Spark because Spark will just like automatically figure out the schema.
And then, you know, my JSON files or my Parquet.
Well, Parquet already has a schema, but like, you know, whatever you can.
You are able to just run these massive queries on the data there and then just, you know, and use it.
So now you don't really need to maintain a data warehouse.
You can just have S3 and run ad hoc queries when you want. But the problem there is, you know, and this is kind of where we're getting to like the data lake house, I suppose, which is a combination.
This is where Iceberg is kind of the the next generation.
And it's not just Iceberg. There's other like maybe something comparable as Delta Lake and Hoodie.
But like the idea is, OK, now what we're going to try to do is pull some of the benefits we got with data warehousing, which was transactional updates, schemas, you know, for my tables so that I can actually like know what the columns are and I can actually introspect and also enforce the schema.
So that way I don't have a giant mess even within the same table with the ability of that decoupled compute and storage.
And so, you know, the way that Iceberg solves it is with like a managed, you know, with the catalog, which manages the metadata layer and provides, you know, provides that metadata, provides the ability to sort of control the schema, evolve the schema over time even and transactionally update the table.
Nice. I don't I still don't get the analogy of the lake, but I'm going to I'm going to research that.
Well, OK, so there's a lake and then there's a lake house.
It's yeah. Yeah. Yeah. And there's an iceberg. Is the iceberg in the lake?
I think it's because you're like there's a hose and you're filling up a lake with the hose or like a pipe.
Oh, we have the pipeline. I like it. I like that.
Lake. I like that. And then anyone can go to the lake and pull some water out.
OK, I like that. I'll take that. Thank you. Appreciate that, Alex. That's great.
That's you know, if I'm going to tell myself a story, that's that's the best one that works for me.
I like it. I like you. You're definitely telling yourself stories with the taxi cabs.
You know, taxis and people, you know, spending almost a million bucks for a helicopter taxi.
I mean, this is the. Yeah, exactly.
It's the world. That's your world, Alex. Well, awesome. Well, I'm going to go ahead and give you this.
This this compliment that came out of a wonderful presentation.
Thank you for what you presented there. So so we talked our two data catalog iceberg and we were talking about reading Pranchu.
Let's talk about how we get the data in there.
What what do we do? What what did we launch today?
Well, we launched a it is a Cloudflare pipelines. I think Alex mentioned a little bit about this.
So, you know, let's say you've got you want to get your data into R2, right?
You want to use these data catalogs. You want to be able to query information from a data lake that has zero egress like R2.
But you actually got to get your data into R2 to begin with.
Right. Right. How do you do that? And so that's where pipelines fits in.
Right. So what pipelines lets you do is ingest real time data streams, think clicks on a website or stock market ticks from financial data.
And we will basically take all of that real time data, convert it into files and then load it into an R2 bucket in your account so that you can create a data lake basically with like very little effort without managing the infrastructure, just set up a pipeline, point data at it and you're good to go.
The metaphorical hose that Alex was talking about to fill the lake up.
That's exactly it.
That's exactly it. Yeah. Yeah. The problem with hoses is that they get stuck and they get blocked.
Sure. Yeah. Cakes get bigger and sometimes it needs to get smaller.
Yeah. You can strike this metaphor quite a bit, but it really works. Right.
I think that some of the complexities and the issues that we're trying to tackle with pipelines.
Right. Like ideally, ideally as a data engineer or even an application engineer, right, like you shouldn't have to worry about will my data make it right?
Like is it actually going to show up there? Right. Like ideally, what you want to do is just put your data into the pipeline and trust that it'll show up in the destination.
Yeah. Yeah. Because does that not happen? That that's a hard thing to make happen.
Really hard. Yeah, it's really hard. You don't take the example here of like real time data pipelines.
Right. Like this is a good example, which is, you know, imagine you're trying to ingest website clicks from your website.
Right. You're clicking on your website and you've got to keep track of all this data so that like later down the road, your data scientists can query the data to understand whether users are shopping the way you want them to shop or what's going on with the website.
Right. So you need this raw data of website clicks, your clipstream data.
All right. Let's say you run a promotion for your website and you get a huge spike in traffic.
You know, all of a sudden, all of your infrastructure that's ingesting this raw data has to be able to keep up with that spike in traffic.
And if you don't keep up with that spike in traffic, this data is gone forever.
Yeah, for sure. That's a challenge with real time data, right?
Like you have to have durability. You have to be able to bear these like spikes in traffic.
And this is really, really difficult to do. And that's actually one of the problems when you try to solve pipelines.
Right. Like you shouldn't have to worry about this.
We'll manage it. We'll manage durability. Just data into your pipeline and you're good to go.
Yeah. And I know what that feels like. I've worked at places, not here, but I've worked at places where you're like, hey, what happened that week?
And like, oh, we don't have any data for that week. Like, why don't we have data for that week?
Because of this, because there's a kink in the hose or something along those lines or it's scaled too much or things like that.
Awesome.
That's that's great. That's super cool. And it's a bit of like a fire and forget.
Is that what it feels like? You just kind of. Yeah, I mean, that's a bubble of pipelines, right?
Like it's very much when you create a pipeline, you can connect sources to your pipeline.
Right. So one of the key concepts of pipelines is sources and sinks.
Right. So you connect sources to your pipeline and then you connect to sink.
Right. And the pipeline makes sure the data goes from source to sink.
So on day one, since we're launching today, open beta, your sources are HTTP clients which can push data over HTTP or a Cloudflare worker if you're already on our platform and building workers.
And the same is in R2 locket. Right. So load data from an HTTP client or from a worker, send it to a pipeline and the pipeline will get it into an R2 bucket with files generated, compressed, partitioned by time to make them efficient to query.
And over time, there's going to be a lot more coming here.
So a lot more sources, a lot more sinks and a lot more happening in the middle of that pipeline as well, like transformations, real-time processing.
How do people do this before?
Like if you let's say that you said worker, I was thinking like I've got some workers and I've made some very popular educational videos that have repos that people click.
I would love to track clicks. How would how would I do that normally?
Yeah, yeah, for sure. So, you know, I think on me, let's take the general case here of like how do I consume real-time streaming data?
Right.
Sure. I think one of the dominant ways to do it really in a long for a long time has been Kafka.
Okay. Probably familiar with Kafka. Right. Like and it's a great tool, I think, in a lot of cases.
Right. Like really low latency. So you can push data to it very efficiently.
It's a really simple product when you think about it.
Right. Because it's just a set of distributed. It's just a log. Right. Like every time you add a row of data, it's just like having a text file and adding a new row of data to the text now.
So it's really conceptually simple. I think the problem with Kafka is that scaling it is really hard and operationally challenging.
And the way that you have to use Kafka traditionally is a reserved compute model.
Right.
And so what that means is you're paying by the hour. And what's hard about this is, let's go back to that example I mentioned, receiving a burst of traffic.
Right.
Yeah. You have to now provision a Kafka cluster large enough to handle bursts in traffic.
So let's say normally you get, you know, 100,000 events per second.
You still have to make sure your cluster is large enough to be able to handle a million events per second, just in case you get that burst.
Right. So what that means is that it's expensive.
Your utilization is really low and it's kind of inefficient.
Right. And I think that's what makes Kafka a little hard. Right. And even though it's a very powerful tool, you know, the cost model of it doesn't quite work out very well.
And this is not even including some of the other things you have to take care of from using Kafka, like replication and oftentimes replication across multiple availability zones.
So now you've got all of the storage costs, like I just mentioned, and you've got data.
Yeah. Because you've got to try to like replicate this data across multiple regions to make sure it's available.
Oh, there's that egress again. And normally you're paying for that. And I think so.
So, you know, with pipelines, actually what we're trying to do is go serverless here.
Right. And I think like that's what we're trying to do here. With pipelines, you know, we're trying to solve a few of these problems.
First is we want to make it easier for you to scale these sorts of pipeline, these sorts of like streaming ingestion systems without manual overhead.
The second is no data egress. Right.
So yeah, you get replicated durable data and paper egress. Right. Like that's something that we will handle for you.
And then the third thing is that, you know, we just want to make it possible for you to be able to build some of these like real time streaming applications.
But don't pay by the hour. Pay with data. Pay according to how much data is actually going into the pipeline and going out of the pipeline.
So that's what we're trying to solve with pipelines today. Cool. I love it.
And I love that it's open beta. We're going to get feedback. You said you said we've got some things and I know I can tell that you're cooking.
I could tell from that the way you said that, that you're cooking some other things.
We don't need to leak anything here, but I know that you're thinking about that.
I know both of y'all are are watching how people are going to use that.
So so please, people watching at home, make sure that you let us know loud because we love that.
We love to get the feedback about how you what you feel is missing for the product, what you like about the product.
So please make that happen. Can you can you do you have a demo project?
Do you have something you could show? I know it's kind of like you do.
I'll send you a demo. Let's do this. Give me a second. I'm going to share my screen.
I'm going to walk you through what the process is of creating a pipeline.
OK. We're going to send some sample data to it. Right. And all right. So good little terminal window.
I've currently got like the little pipeline. Let's enhance that.
Can we enhance that a little bit? All right. Zoom and enhance. There we go.
Is this better? Maybe let's go three more. Three more of those. That feels good.
OK. Oh, sweet. So it's in Wrangler. It is indeed in Wrangler. So I'm going to run.
So I'd like to open beta as you can see here.
Right. So anyone watching this, if you have a worker's paid plan, like you can start using this right away.
We're going to do is we're actually going to create a really simple pipeline here.
Right.
So I'm going to copy and paste the command here just to make this a little easier for myself.
I'm going to say the worker's paid plan. Worker's paid plan is how much?
Five dollars a month. Five dollars a month. OK. All right. I was making sure that that's known there because like sometimes that's like, oh, well, great.
Now I got to pay. Oh, it's only five bucks. OK, great. So we're going to create a pipeline, right?
It's called a quick stream pipeline. OK, we're going to specify a destination for this pipeline now.
Right. So I'm going to say R2 bucket. We'll just say my bucket.
Right. So what we're going to do now is hit create. Right.
So I'm going to hit enter. And I start to see something here. Well, we need our pipeline to have permissions to write data into this R2 bucket.
You can't see this because this is opening up in my browser, but you can probably tell here what's happening now is we're exchanging credentials via OAuth.
Right. So I've got a link open in my browser that I'm going to approve.
Just done. And as you can see, we're checking my pipeline is being generated, it's getting access to this R2 bucket.
We'll give it another second. All right, and we're good to go. So there's a lot of text here, right?
I'm actually excited about what we're doing because I think this is really cool, what's neat about pipelines, right?
You'll notice right off the bat, we have an HTTP endpoint over here.
Yeah. And what's really nice about pipelines is this endpoint comes out of the box.
So if I think these are pipelines now, you could just start pushing data to that endpoint.
You don't have to set up any other infrastructure, you don't have to set up a worker or another server to accept requests.
Yeah, you got an existing application.
Here's this pipeline. There we go. And you can set authentication here if you want to, right?
Like you can even set cross-origin domains. And what that means is if you want to collect clickstream data directly from a user's browser, going straight to the pipeline, like you can do that as well.
Nice. There's a lot of flexibility here, right?
And the idea is like we want to make it dead simple for you to set up a streaming injection service, right?
Like just one click and you've got this going.
I want to highlight another thing over here, which is you see a destination here, right?
So this is the R2 bucket that I just defined. The file call right here is new line delimited JSON.
That's what that means is we're going to take JSON records.
One line in your file will represent one record. Okay.
We're going to compress those files. So that reduces your storage bill. And now what's interesting here is the bat heads, right?
This is really critical. Because what the bat heads do is they define the cadence at which your pipeline will write files into R2.
Take a quick step back here, you know, pipelines is meant for ingesting, you know, hundreds of thousands of records per second.
Imagine you're creating one file every single record, you know, you would have an R2 bill that skyrockets.
And even though R2 isn't that expensive, like, it costs every day to write data to it.
Right. It's really inefficient and really slow. It's like- You'd have that compaction problem.
Yeah. You'd have that compaction problem. Yep. So pipelines is going to fix that for you by just basically batching all of the inbound records, right?
So as records are ingested, pipelines is going to batch those records.
And then once a batch is filled up, it's going to write. Right. So what this is saying is we're going to wait for at least 300 seconds to fill up a batch, or we're going to wait a maximum of 300 seconds to fill up a batch.
Okay. Or 100 megabytes of ingested data, or 10 million rows of data.
Sweet. Okay. Okay. So, and that's out the box.
You didn't set that. That's the default. That's the same default there.
Yeah. Okay. Yeah, yeah. So as soon as one of those batch definitions is met, the pipeline will like take all of the data that's been batched up, generate a file, and then deliver it into your R2.
Awesome. Awesome. So that's dealing with the problem that you were talking about, where you don't know the scale of what things are.
You're saying this is the max of where you're going to go. Yeah. Yeah. Yeah.
This is really not generating query efficient files, right? So like what this is doing, like help me generate large files, basically.
Yeah. Because large files are more efficient.
Right. Okay. Cool. So the pipeline is ready. So you can actually like start to like commit data to it.
So I've got a little help command here.
I'm actually just going to copy paste this in. Craig, your old friend Fubar is back.
Fubar is back. There it is. There you go. And so I want to say a couple of things here, right?
So the first is I've just got a bunch of data here, which is honestly kind of trivial.
Pipelines are built to ingest a lot of data, right?
So you can ingest up to a hundred megabytes per second per pipeline. Wow.
You're up to a hundred thousand requests per second, right? So this is meant to be high throughput.
That's a lot of clicking. It's a lot of clicking. Yeah, yeah, yeah.
It's a lot of requests. So this like single request is kind of silly, right?
Put it in a nice example. You notice it says committed here, right? Like what this means is that, all right, the pipeline has accepted the data.
And we actually at this point guarantee this data is going to make it to your R2 node.
Gotcha.
So you can rest assured like at this point, you're good. Awesome. Wow. And it's just there.
It's there at that URL. And it looks like there's also a binding. You could also do it from the worker.
Yeah, there's a binding. Yeah. From the worker.
Yeah. And if you've used queues, like the interface is pretty similar, right?
You set up a bind date and then there's a send method exposed on the binding.
So it would be, you know, PMV .pipeline.send and then send. Awesome. Cool.
And you don't wait for it. You just let it go. Yeah. You let it go. And if you get this result back, like that means the pipeline has accepted this record.
It's going to store it till those batch definitions are met.
But you know, like the rest of your application can keep doing what it needs to do, right?
Like. Right, right, right.
You're not getting in the way of the user. Yeah. Awesome. And then we're going to go query it with our iceberg stuff.
We're not going to get in the way of the user.
Awesome. I got it. I got it. I think I understand. Y'all, I think I get the big data, the big data concept here of this.
And it's a new era. I've got, I've got some, we've got some videos to make.
Sounds like. Some educational videos of using this stuff to go.
This is really awesome. Really, really awesome.
I think that people have probably wanted to build this. And in fact, I know that I've talked to some startups.
I'm like, would you just please let us do this on you?
And it's here. This is us listening to those requests. This is awesome.
Super exciting. I have a couple of special guests that I'd like to bring on.
Do you guys have anything that you want to say before you close out here? Any big thoughts from the big data crew here?
Nothing other than there's a lot more to come from us in this space.
I mean, this is all, there's a whole world of tools, products, and workloads, tools and products that we want to build and workloads that we want to support.
So querying, real-time stream processing, there's a lot coming soon.
Awesome. Awesome. Alex, any last words? We're going to be, you know, building out those managed features, but continue to, you know, use us and give us your feedback.
Let us know how you, how do you want to use us?
Like what are some other features that are important to you? That's important to your organization.
Yeah, just let us know. Yeah, be noisy. Go to discord .Cloudflare.com.
We hang out there. We listen to all the things that you're thinking about.
So come let us know what you're thinking, what you're building and any sort of question there.
We'll get, we'll, these are open beta and we do such a good job, I believe, in open beta and lots of free stuff too.
Like come play, right?
Awesome. Thank you guys so much for coming on here. I'm going to bring on a couple of other folks here.
So I'm going to send you, if I can figure out how to do it, I'm going to send you backstage.
Thank you, Branshu, for coming and thank you, Alex.
And thank you for all your analogies. I learned a ton from you today on stuff.
Thank you. All right, so here we go. I have the privilege of inviting the newest, I know these guys already.
You might know these guys already, but the newest Cloudflare folks this week that I'm aware of.
I am so excited to have you both here.
Outerbase is here in the house as Cloudflare. Brayden made a tweet earlier and he said, I believe that they, it was something, he's responded to somebody.
I believe that they, I said, no, Brayden, we, we do that.
So welcome you guys. Welcome both of you.
So good to see you. Thanks, Brad. You did correct me and thank you for correcting me.
It was, I deserve that. Awesome, so for those of you who missed the announcement, because there's a lot of announcements going on, what happened?
What happened this week with y 'all? So, you know, minor news, minor news.
We joined Cloudflare. Yeah. Congratulations. Thank you. Congratulations, yeah.
And y'all come with, with some, I know you from, I see these tools that were out there with like these amazing tools that y'all are building, gorgeous tools, like, wait, who, what is this?
What are you guys building over here?
Like incredible ways of like inspecting data deeper than I've ever seen.
I've seen some stuff that you've done on the durable object level where I'm like, oh my gosh, that is, I need to know what's happening here.
So really, really clever tools that you all were building.
And I heard, I heard that we have another announcement today about D1 read replication.
And somebody said that y'all already built something.
You already made a, there's a, there's a something that's a demo of the read replication that you built the week you joined.
Is that true? Well, one thing, you know, being a startup, we like to ship.
That's why we're excited about Cloudflare because Cloudflare likes to ship.
Yeah. Yeah, that's so cool.
We put together a little demo. If you go to replicas.pages.dev, you can actually see, you know, read replicas just launched.
You can see how much faster your queries are.
One of our engineers is actually based in Cambodia. One of his biggest complaints was how slow everything felt for him all the time.
With read replicas, we would never have to hear those complaints because it all would have been fast.
So yeah, it actually, it takes your current location. So wherever you're, you know, hitting the website from, and then basically does a, you know, shows you a primary database that would be on basically the other side of the world and shows you how much speed you're saving.
Oftentimes it's 200 milliseconds plus of just latency being saved.
You're getting like two millisecond queries versus what would be 207 milliseconds.
It's really cool to see. Do one of you want to share that?
One of you want to click that share screen button there? I know. Yeah. Let me pull this up.
Brandon historically has a hundred tabs open at a time. So he's got to separate his tabs out.
Yeah. Okay. I roll the same. That's my big data in my tabs.
I keep my big data that way. I need to compact this. Okay. Oh, gorgeous. So yeah, you can come down and you can see, first thing it shows you is how much time you're actually saving versus a replica, you know, that would be in Melbourne.
So you can highlight two and you can see, you know, it takes you to different parts of the globe.
I like that fun. My background is design. So I spent too much time on things like this, but you know, I like it.
Oh, it's moving the map to where it's at. That's gorgeous.
Yeah. That's awesome. Sweet. So, how do I do these replicas? I'm glad.
Do you know that? I have, yeah. Great. If you scroll down, you'll see two things.
There's two ways to actually enable this. There's the API. So you can copy this or bring in your own account ID and your database ID that you want to enable replicas for and just, you know, run this curl and it'll turn it on.
If you don't want to just run a random curl, oftentimes you want to do a bit more and understand what's going on.
There's a UI for it as well. There's affordance in the UI.
So you can actually go visit your database and you'll see this button down here called enable read replicas.
And you can just, a deep link, you write into it and you can click on that and turn those on.
Awesome. So the person who's in Cambodia, if you, they would turn on a read replica of, or if they started the database there, they would turn it on for other places where the users might be.
So that we can make this speed happen faster.
Exactly, exactly. I can let Brayden talk to this next part too, which is really cool, super technical.
So I'll let, I'm going to pass that over to him, like what the read consistency does here.
So yeah, the great thing about this is not only are there read replicas all around the world, but you can use the sessions API that's now part of this to have sequential querying.
So let's say you insert something into the database and then you immediately want to make another query.
I'm already losing my voice. It's been a big week.
I bet. You know this, you want to make a second request to fetch the records from that database.
Well, you want to make sure that you're fetching the records with your new write as part of it.
So the sessions API allows you to do that. So when you make a request, you get this bookmark and this opportunity to save and send with the next request.
You want it to be sequential, at least after that past request has gone through, which makes it really nice.
So you're making sure you're getting the right data after you make a write.
And this is obviously fully optional.
You don't have to use this, but it's a really- But if you did it, it gets weird, right?
If you're like, wait, I wrote this. I know that I wrote this. Yeah.
Exactly. It's helpful for a lot of use cases. Yeah. Okay, sweet. Awesome. So we didn't have this before.
This is new. Yes. This is brand new. Well, thanks to like- Obviously, it predates us being at Cloudflare, obviously.
The team- I think you can take some credit.
You can take some credit. My team. Say my team. Do that.
Just, I want to hear you say it. You can say it, Brayden. Just gotta- Just Craig's been giving us trouble week one.
Yeah. I think he's- You've already taken credit.
These blowhards. Someone took it off the hard work. We just come in, put a fancy layer on the top of it, make it look shiny.
No, but I think this is important, right?
This is important to think through, and I appreciate that. So again, that's replicas.pages.dev.
Come and see what that feels like, where these different things are.
Come set up your replicas. And I love that the code's on there, too. I love that you're able to go and start thinking about that as well.
One other thing to call out, if you don't have a database, they just launched the Deploy to Cloudflare button.
You can actually click this, and it will deploy a demo for you, so you can see how they work.
So you click this button and check it out. Oh, awesome. That also launched this week.
That's super rad. Yeah. Super cool. Awesome, y'all. Well, I wanted to say welcome.
I wanted to say welcome live on a stream. So happy to have you here with us, and thanks for already- I mean, I know that you're always cooking, but thank you for cooking for us already, too.
That's super cool. Super cool. I really appreciate it.
Yeah, yeah. And thanks for jumping on, too. I was like, about three minutes before this live stream came, I was like, y'all wanna come hang out?
Like, oh, sure. So thank you. That's huge, too. Anytime, Frank. Anytime for you.
Awesome, guys. Awesome. So it's been a big developer week. What's been your favorite thing so far?
So day one, early on in day one, there's this announcement of Outerbase acquisition, and I just have to say, that is the coolest thing that had happened all week.
I literally air-punched. I literally punched in the air when I saw it.
It was so cool. Because they kept that a secret. I didn't know.
I didn't know that it was coming. Yeah, yeah. So super excited. Yeah, but besides that, I mean, there's so many cool things launching every day.
I mean, the deploy for Cloudflare, like, literally already using it.
It's awesome. Replica's amazing.
Like, outside looking in before this, dev week was always so intimidating. It's like, what is Cloudflare gonna launch?
How can we keep up? Even as a startup, there's so much shipping, but now we get to be part of it, and it's amazing.
Awesome.
Awesome. I'm gonna echo the sentiments. Outerbase acquisition number one.
You can't top it. Yeah. That's awesome. Yeah, congrats. Congrats, both of y'all.
Congrats. So awesome. And then the second one is gonna be the re-replicas right now.
We're database nerds. Outerbase is all databases, right? So this speaks our language really well.
So I love it. Yeah, awesome. MySQL's a good contender getting that in there.
Oh, that's right. That's right. The hyper drive MySQL, right?
Yeah, we talked about that. That's awesome. That's such a, such a reach to everybody, right?
And making it easier. I think just making it easier to do this and using our global network, right?
Of like, boom, now you can do the things that you wanna do with it.
So super cool. Super cool. Awesome, guys. Thanks so much. Developer Week is not over yet.
There is one more day and some more stuff coming. So make sure that you tune in to the blog, and we're gonna probably do a replay of everything because it was a lot.
There is a lot going on, and I'm sure y'all watching at home didn't catch it all.
I know you all were also air-punching, so you probably missed some things while you were excited about joining.
So thanks, everybody, for being here, and thank you, Brandon, for coming and enjoying it.
I really appreciate it, and welcome to the team.
So glad to have you here. Awesome. Thanks, everybody.
We'll see you soon. Thank you. Thank you. Thank you. Thank you.
Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.