🚚 R2 GA + Store & Retrieve Cloudflare Logs on R2

Presented by: Aly Cabral, Cole MacKenzie, Tanushree Sharma, Paulo Costa

Originally aired on May 4, 2023 @ 9:30 AM - 10:00 AM EDT

Join Cloudflare Product Manager Tanushree Sharma, Product Manager Paulo Costa, Systems Engineer Cole MacKenzie and VP, Product Management Aly Cabral to learn about R2 GA and Store & Retrieve Cloudflare Logs on R2!

Read the blog posts:

Visit the GA Week Hub for every announcement and CFTV episode — check back all week for more!

English

GA Week

Transcript (Beta)

Cool. Well, everyone, and welcome to Cloudflare TV. We are on our third day of GA Week. And if you're just tuning in, we've split up. We traditionally have birthday week every year towards the end of September. And we've split it up into GA Week and birthday week. GA Week is where we announce all of our products that are ready for our customers to get their hands on. And then birthday week is sort of the more aspirational. Here's what we're currently working on, giving you a sneak peek to the insides of Cloudflare. But super excited to talk about two exciting GA Week announcements today. Firstly, R2 goes into GA. And with R2 GA, we're also announcing support for storing and retrieving logs on R2. We'll start off with some intros from the team here. My name is Tanushree and I'm on the product management team. And I'm excited to be joined by Aly, Cole and Paulo. We'll turn it over to them for intros. Do you want to start off, Aly? Sounds good. Hi, I'm Aly Cabral. I run the product team for our developer platform. And to us, actually, when we serve developers, that involves a lot of pieces of Cloudflare. So under me is product management, product marketing, and developer advocacy. All with the aim to make our users successful on the platform. All right. Cole, I'll pass it off to you. Yeah, I'm Cole McKenzie. I'm a systems engineer on the data team. We're the ones responsible in getting your logs all the way from the edge into your R2 bucket. That is a lot of work. I also should mention that I've been acting PM for R2 directly, too. So that's why I'm relevant to this specific conversation. Left the important nugget out there. But Paulo, we'll pass it off to you to close out the intros. Yeah. Hi, everyone. I'm Paulo. I'm product manager for Cloudflare Images, which is a product that runs on the developer platform, which is why it's relevant here for the R2 world and the R2 announcements. And yeah, I'll share a story in a few minutes about it. Awesome. Yeah, let's dig into things. Let's start with R2. So R2 was announced last year during birthday week. There's been lots of excitement from it from both individual developers that are excited to start using it, but also we've seen really big enterprise customers that want to get their hands on R2. Personally, one of my favorite announcements from last year is I'm super excited to see it going GA, see it released to all of our customers. Ali, do you want to tell us a little bit about the journey over the past year from announcing it to GA? How did that go? Yeah. No, I think it's been a big journey for us as Cloudflare getting into some durable state. The durable state business, the durable state game is a little bit different than the caching game, and we've learned a lot along the way. But let's start by talking about Cloudflare's mission, right? To help build a better Internet. And we noticed a problem here, especially when it came to object storage of people who were even putting cash in front of S3 buckets or putting cash in front of other object storage providers. Every time they were pulling that data into cache or they were actually making use of data they had stored, they were paying both a request fee and an egregious egress tax. And that was a substantial part of their bill. And it really caused interesting or strange architectural decisions around optimizing for minimal data transfer. There was a lot of engineering time spent on trying to work around the reality of egress fees across these platforms. And we noticed that the Internet, websites, web applications was eager and hungry for an object storage provider that didn't have that egress tax. And really, the first innovation here is in the model and how we're serving users with charging on requests and storage and very simply not requiring optimization or architectural decisions based on egress. And it fundamentally comes down to this question, like, does anybody store data with the intention of never reading that data? No, not really. Because you want to get value out of the data you store. That's where you're storing it. So, that's where the origin story started, right? And how it's relevant to Cloudflare's overall mission. We learned a lot in the beta program. We've had over, you know, 13,000 users storing data onto us, which is really awesome traction. Overwhelming traction. That's really, yeah. It's only been a few months since the beta started as well, right? Yes, it's been four months. A little under four months. And it's been quite overwhelming to see the usage pick up across that time. And actually, we are joined by one of our users on this call today. Cloudflare images. And Paolo, do you want to speak to your experience moving over to R2? Yeah. So, Cloudflare images, for those who don't know it yet, is a product where you can store, resize, transform, and then serve all of the images that you have on your website or application. And storing is a big part of it. Because it allows a customer to not pay an egress fee to a different provider where they have to store an image and grab an image every time it needs to be resized and served. And so, we created that product, this product, and it's been launched one year ago, exactly, in birthday week last year. And we started by using a different provider because there was no R2 at the time. So, we had to select an external provider. And while we told our customers, you don't need to pay an egress fee, the truth was that we are actually having to deal with that problem ourselves. And that doesn't make sense. We're not doing what we're preaching. And as soon as R2 became more than an idea, we realized this is too good to be true for us. We just need to join forces. And to be completely transparent, everyone was a little bit afraid in the beginning. Like, okay, we have, you know, it's a provider that has years and years of experience. And we're going to change to something in-house that is being built right now. And the timeline for the entire migration of Cloudflare images to R2, to be powered by R2, a little bit over two months. From the moment the team set up together and said, let's do it, to the moment where we said, shut off the external provider. All the new images are being stored in R2. All the images that we need to fetch will be fetched from R2. And this is what we have. So, we actually did it before R2 went GA. This is how much we trust the product. And just today, we also launched a feature much requested by our customers for Cloudflare images, which is the ability to store and serve SVG files. Interesting statistics says 50% of the websites on the Internet already serve at least one SVG file. And so, we had this demand. We didn't support SVGs before. And not only do we expect new customers to come in because now we support the file type that they need, we also expect our existing customers to start, you know, storing with us their existing SVGs that they had to store somewhere else. So, yes, we're expecting a load, an extra load of uploads and fetches from R2. And we're 100% confident with that. It's like, yeah, it's business as usual, because it just works. And in fact, we had during these last two months, we had a number of sync calls with statistics and performance comparisons. And we reached every target way before these last two months. So, we just kept it safe for a while until we decided, yeah, it doesn't make any sense to keep this waiting for the GA that let's do it right now. And so, we did. Our Cloudflare images, 100% powered by R2. Wow, that's awesome. We're a big advocate for dogfooding at Cloudflare. And part of it is internal teams using products that customers are using in production. So, it's super awesome to see that. And also, I'm sure there's a lot of feedback between R2 and images, and that just helps make the product better for everyone. I'm curious, Paolo, was there any customer feedback specifically about the migration? Were customers able to tell it happened? Did it change that experience at all? We had the goal of having zero impact on our customers in the sense that we'd have zero feedback saying, hey, this is throwing an error. This is slower in my particular region. We had that goal. And we did the entire migration with zero impact on the customer. Nobody noticed. In fact, we told a few customers. We started asking for, because we knew the statistics were on our side, but we wanted to get customer perception. If we told them, hey, we're changing the storage provider to an internal one, what would they think about it? And what would be their reaction? Would they start complaining about stuff that we actually had the statistics on our side to see if they were right or not? Nobody blinked an eye about it. Nobody said, oh, it's worse. And actually, we're getting very interesting numbers around performance improvements. So, we surpassed all our targets by a large margin. That's awesome. That is fantastic. That was fantastic. And to be honest, I've been doing product management for a few years. It's probably the collaboration between teams that I saw being the most fruitful of my entire product management career. It was incredible to see how both teams went together. And the R2 team, like you were saying in the beginning, the R2 team said, well, this is a customer that we're onboarding. And we were like, okay, R2 is going to be our vendor. So, we're going to act like that. We're going to push these guys. And they were like, okay, we're going to have our roadmap for these guys too. And things went so smoothly. It was incredible to see, actually. That's super great. Ali, do you want to tell us a little bit about the GA announcement? New features? Highlight some stuff. Yeah, absolutely. When we went into beta about four months ago, some of the biggest feedback that we got, originally, when we went into open beta, you had to access R2 through a worker directly. So, you would write a worker. That worker might just be a very simple proxy to the R2 service. And you would do your puts, your gets, all of that through a worker itself. Now, really great benefits to because you get to do a lot of custom logic on the hot critical path of an R2 request, which is really powerful. But not everybody wants to do that. Some people want to take their Cloudflare cache, put it in front of an R2 bucket, and then just use the same Cloudflare cache config they're used to and just get going. Put bot management in front of it and start hosting their website on a custom domain. So, we actually, in the beta program, we just rolled out to everybody public buckets, which allows you to open up access directly to your R2 buckets, which then you can CNAME a Cloudflare configured domain, maybe that has bot management rules or your cache config information, and put that directly in front of your R2 bucket, just taking advantage of what you already configured very simply and just be a drop in replacement for what people might already be doing for their other object storage buckets. So, that's in place. Big project. Took a while to make sure that we got right. But very excited about that effort. Definitely our most requested feature in the beta program, for sure. Second piece is pre-signed URLs. Sometimes people want to delegate their own permissions to a specific object to a user. An end user, maybe. So, like, if I am an application that allows people to upload images to a forum, well, I don't want to give every user of that forum the ability to upload images or delete images across the entire bucket. But if they go in and want to upload a specific file, I can name that file and give them the permission to directly upload that file via a URL. These pre-signed URLs just allow for a URL path. You just set a pre -signed kind of, like, permission structure for either upload or sharing a file, and then you can just allow or delegate your permissions to that specific object without needing to open up the entire bucket permission structure. So, it's really good for people who have their own end users that don't want super permissions to a bucket wholesale. Those are big ones that we've worked on. Like, big user functionality. We are continuing and will always be on our obsessive path of performance optimization. That will always continue. We did a lot of great work, as Paolo alluded to, with the images team to make sure that we had our statistics in place and we were tracking the right numbers. We will continue to be obsessive about that because that's the kind of company we are from a performance perspective. Yeah. Always optimizing. And can you tell us a little bit about, give us a sneak peek, what's next? What are you hearing from customers? What is the team going to be focusing on over the next couple quarters? Yes. So, one of the big things is around object life cycles and object versioning. So, of course, as people make new uploads or new modifications to objects and buckets, they want a versioned history of those objects so that they have some protection to kind of roll back if they need to, right? The other piece around object life cycles, people tend to have systems in place where they might want to delete an object if it hasn't been accessed for 90 days or it hasn't been accessed for a year or set some kind of life cycle management policy on an object level. And if you wanted to, say, delete an object if after it's been inserted, like 30 days after it's been inserted, you can kind of set these custom policies that then the system is responsible for managing and not the user. So, our continued path of taking burden away from the user and making a system burden, that is something that we want to offer next at the object layer. Got it. Sweet. I know that that's super important for customers that are storing their logs as well so, yeah. Any other last things to mention or we can kind of switch over to the log side of things? I'd love to switch over to logs. Okay. Cool. Well, so, following the R2 announcement, we're excited to announce that log storage on R2 is also GA for enterprise customers. Historically, we've supported other object storage destinations like S3 or Azure or Google Cloud buckets, but super excited to add R2 to this family and to have a direct integration. I can give a little bit of background about logs, why they're important. A lot of our customers are security customers. In the tech industry, it's super important to have logs for observability, debugging, auditing, security, and sometimes there's even legal requirements that you have to store logs for X amount of time in case that a legal case comes up. In retrospect, you need to look through the logs to figure out what's happening. And our customers are, you know, some of our customers have tens to hundreds of Cloudflare zones. They have multiple accounts. They use products that are all across our suite from CDN, WAF, Spectrum, our layer three, layer four products, and we provide logs for all of those and always continuing to increase and bring logs for more products. So, it's super important to get this visibility into, hey, what are my products doing? If something is acting weird, you need to be able to go in, look through logs, and use them for debugging purposes. I've heard quite a bit from customers that they have to make tradeoffs between the amount of data they store and also how much it costs. So, as any business, you have to think about what's the value of this information. And sometimes it comes to either sampling logs, using shorter retention for logs, or omitting logs for a product entirely, which can lead to gaps in visibility. And so, with the use of R2 storage, this makes the decision a lot easier for our customers because of the cost savings that come with R2. We're cheaper than the competition. And so, we'd strongly encourage any customers that are sort of facing this dilemma to shift over to using R2, try it out. And we're continuing to build things on top of just storage as well. So, we've got lots of good things coming, stemming from that. Yeah. So, storage itself on R2 is 30 to 40 % cheaper when compared to S3, which is really huge for our customers, as I mentioned in the past, that some customers send a lot of logs to their object storage destination. And just the storage piece is super important. But as Ali mentioned earlier, R2 also doesn't have any egress fees. So, if you're a customer and you have lots and lots of terabytes of logs being stored, using R2 means you don't get charged for bytes in and bytes out, which is super useful. That stuff really adds up. To give a customer scenario of when you'd be using R2, and specifically logs on R2, I've heard customers say that sometimes they have incidents, they have outages, they want to be able to have more granular visibility into things, and they want to see exactly what's happened. But they don't want to send all of their data to a seam, because that can add up. Seams are really, really expensive. So, what they end up doing is that they have a preferred analysis tool. Maybe they use Kibana or Grafana Loki, sort of picker poison there. And they pull logs based on certain time ranges in an ad hoc manner. So, if there was an incident last week, they might pull logs for the entirety of the week and pipe them over to their analysis tool. And we have customers following this model, again, because it's cheaper and it's sort of just ad hoc analysis, a one-time thing. And as I mentioned before, if you're a high traffic customer, you have a lot of logs. So, if you want to scan logs for a week or even a few days, the egress fees that come from that could really add up. And with R2, you don't have to worry about that. So, that's sort of the incentive to switch over. And also, Cole has been working on a retrieval API to make searching for logs on Cloudflare much easier. Cole, do you want to tell us a little bit about what you've been building? Yeah, sure. So, with logs going to R2, we support all the same datasets that we supported to the previous object stores, ACP requests, firewall events, access logs, DNS, you name it. But one of the other problems that customers have is once those logs are in their bucket is now getting insights from them. So, Tanushree mentioned that they have to either load them into a SIEM provider, need to continually index them using Kibana, and they need some other process to basically gather insights from their logs. But what it comes down to a lot of the time is that customers only need to look at a specific slice of their logs between the time range, a start and an end, like Tanushree mentioned, a week's worth of logs. So, there's no reason to index all of your logs if you're just going to look at a week or days or an hour's worth. So, what I've been working on on the data team is basically allowing users to come in, store their logs on R2, then they can also come in and say, I want to see an hour's worth of logs for this dataset. Let's see them. And what we'll do is we'll go fetch those logs from your R2 bucket and display them to you. And this is really great too, because previously we had an older product called LogVault where we would store the logs on your behalf. And that means if you had LogStream enabled, we were sending one copy of the log to you and then one copy of the log to ourselves. So, you could do this querying. But now we just send one copy of the log to you and you can let R2 and its upcoming features like lifecycle policies, retention period, and so on, handle all that management. And you can also have finer grain control, like the pre-signed URLs. You can share logs with other team members, guests, external parties, without having to worry about sharing too much information with them. That's super cool. Tell us a little bit about the process of building this out. What went well? What were the challenges? Yeah. So, this new API, we hand and hog between whether we're going to build it on workers or internally in one of our data centers. And it turns out that workers was actually a pretty good fit for this. Same technology that R2 builds upon and a lot of other cloud for projects like images, I think as well, right, Paulo? We were a little bit hesitant because with logs, it can become very unknown how much data you're going to get back when you ask for a specific time range. So, some customers may only have a couple million records for an hour. Some customers will have trillions of records for an hour. And it's very hard to basically serve that much data without knowing how much data to serve. So, luckily, though, on R2, there's a streams API. So, using a combination of the streaming API, we can start returning results to the user while in the background, we're continually concatenating the rest of your log files that match. So, this lets you just start streaming and consume your logs in any format you want. If you've seen or are familiar with the other product, Instant Logs, you kind of get the same experience where you'll get a stream of HTTP logs and you can pipe it into any program you want. You can use angle grinder, you can use bash, you can build your own system. And this way, like Tanushree mentioned, you don't need to necessarily use Kibana or Loki or anything like that if you're just doing a simple search. You can just pipe it through any kind of command line utility that you have on hand. Got it. That's super cool. You mentioned the indexing piece a little bit. I wanted to dig into some more of that. I know that the team had been had not used that in the past, but that was sort of a new addition. What were the performance differences there and how does that make the experience better for our customers? Yeah. So, one of the other features that our previous log pull offering had was the ability to look up a request by array ID. So, oftentimes, you'll have a customer where you're debugging an issue and they're saying, hey, this request is always failing. Here's array ID. Can you figure out what happened with it? And then you have to go to your scene or your S3 bucket and you need to write a query and scan through all your data. And then eventually, you hopefully find array ID or if you use our log pull API, it can hopefully find the array ID in those logs for you. But the challenge is you're going to have to index all of your data. You're going to have to ingest everything. And that can get really expensive as well. So, with this retrieval API, we also have a array ID indexing feature. This is also backed by workers using our durable objects. And you can specify a time range just like you do in querying, but you can also index time range by array ID. And then you can come back and say, I know a customer, you know, within the last hour was having issues connecting to our site and they gave us this array ID. You head over to the retrieval API. You say index that last hour. And now, what happened with this array ID? And it will actually serve you the log line directly from your R2 bucket. This is what happened with that request. You can see where it went upstream, how many bytes were sent or any other logging fields you configured, which is really powerful. Got it. That's awesome. And I love that all of this is using Cloudflare products, again, with that dogfooding story. Cloudflare on Cloudflare. I hear you have a demo for us as well, Cole. I'd love to see that. Yeah. So, I can give a quick demo of the API. Just some background around it. We have already a website set up sending logs to an R2 bucket. And basically, using the API, we can set up, it exists underneath the account because R2 buckets are underneath the account. We can say we want to list our logs and we specify a start time range and an end time range. And these are using like a pretty standard date time encoding, IRC339, for anybody who's familiar. We specify where our logs are, which bucket we're storing them in, and then also basically where inside that bucket. So, we can put multiple data sets, multiple different zones, and all that all in one location to kind of make management easier as well. You don't have to track down where your bucket is. You can just say it's in the log bucket. Go find it. So, using that, I'm just going to show an example. We specified a really large time range. This is the whole day's worth of logs. And it's very hard to stream because this is an unknown amount of data. So, we actually try to provide positive feedback in our error messages saying, oh, your time range returned too many results. Let's look at a smaller slice of time. So, if I just update this to be the same day. So, we're looking at about 10 seconds worth of logs. We go and ask R2 and it says, oh, we found all these matching files that will contain logs between those time ranges. And this is useful if you want to just know which files contain it and ingest it in some other system that has a direct integration with R2. But sometimes you might just want to get those logs exactly from those files without having to go fetch them yourselves. So, we also provide another endpoint, retrieve, where we can specify all the same parameters. We can go ahead, send the request, and we'll actually get back all that data in the response. So, now you can take this data and do this type of request in your terminal or any other program, and you'll get this data back that you can process however you need. So, here we see all the fields we got back for those requests. It was actually a pretty sizable amount for only 10 seconds of logs. And from here, you can basically do whatever you need with logs. If you do too large of a time range, you'll get the same error as before. So, it's really important to know what this new API and kind of building on workers in general is smaller, but more frequent units of work is usually always better than trying to bulk things up and do it all at once. Got it. This is really great. Sorry. Sorry to interrupt you, Chinushri, at all. But this is really awesome to see. I really this is my first time even going through this experience. So, really cool. Really cool work, Cole. I'm sorry. Pronunciation there. I've not heard that one before. So... This is great. I always get requests from customers that they want to do log pull functionality, but with more than just the HTTP request data set. So, this really opens things up. And with that, that's a wrap for today's Cloudflare TV segment. Super awesome getting to share new products with everybody watching. And stay tuned for the rest of the hour. So, we'll yeah, you'll see my face again. Bye, everybody. Bye, everybody. Thank you.

GA Week

Welcome to Cloudflare's first-ever GA Week, where we'll announce the general availability of many exciting products, and learn how customers are already using them. Find every announcement on the GA Week Hub!

Watch more episodes