Latest from Product and Engineering
Presented by: Jen Taylor, Usman Muzaffar, Patrick Donahue
Originally aired on August 29, 2021 @ 2:30 AM - 3:00 AM EDT
Join Cloudflare Head of Product Jen Taylor, Head of Engineering Usman Muzaffar, and Director of Product Management Patrick Donahue for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
English
Product
Transcript (Beta)
Hi, I'm Jen Taylor, Chief Product Officer at Cloudflare and I am thrilled to welcome you to another installment of Latest from Product and Engineering.
Awesome, thanks Jen.
I'm Usman, Head of Engineering at Cloudflare and we have a very special guest this time, one of our favorite teammates.
Pat, why don't you say hi? Great to be here, long time listener, first time caller.
Thanks for having me on the program.
Patrick Donahue on the product management team focused on security products.
Excellent. We wouldn't be able to do that without cracking up, but it's awesome.
So great. I think part of the reason we wanted Pat here is to talk about security products.
Jen, what are some of the first questions we want to ask Pat?
Yeah, so Pat, one of the things I've been spending a lot of time thinking about over the course of the past couple of weeks is what we've been doing with our web application firewall or WAF.
I've seen a lot of stuff come off the factory floor for that.
And I'd love to just understand, stepping back, what is a web application firewall?
Sure. That's a great question. So a web application firewall, it's a piece of software that sits between a browser and our customers' applications.
And what it tries to do is it looks at those requests and that traffic and tries to identify requests that are potentially bad and potentially dangerous for the application.
And so as those requests come from the browser, it'll break it down into different parts and look at that and try to identify, is this a malicious request coming in that we want to not let through?
Something that maybe is a specific hole in the particular application that the request is headed towards?
Or maybe it just looks generically like something that is bad that we want to block.
And so it sits between those applications, wherever they may be, whether they're in the cloud, SaaS applications, or potentially on our customer servers.
Now, how does that fit in with DDoS and rate limiting?
When we talk about our security solutions, we often talk about them as a suite or a host of solutions.
How does that work with these other, how does the WAF work with these other things?
Yeah, no, absolutely. So we like to think of the web application firewall as the tentpole that holds up all these other solutions.
And so it's where you go in the Cloudflare dashboard and the Cloudflare interface to configure all those other security products.
And so we've got a couple of different parts of it.
So we have what we call the managed rule sets, which are the rules that our security researchers are writing and researching and developing, oftentimes, immediately, as soon as something's announced.
And then we have what we call the custom rules in there, which are the tools that we give people to write their own rules and craft their own rules.
It used to be that you'd have to go and fill out a form and wait a couple of days for someone to get back to you and fill something out.
And I saw them in an old mockup the other day, and I was like, whoa, where did that come from?
And so it used to be you'd have to wait, but our customers, obviously, they want to be able to kind of serve stuff to themselves without anybody having to jump in.
And so we've given all these different tools and we're packaging those together in what we call firewall rules.
And so you could write a rule that says, don't let something above a certain number of requests per minute, for example, come in.
And so that would be a rate limiting rule.
But all of these kind of layer seven application layer security tools together form our web application firewall.
You know, when I first was invited to join Cloudflare by our CTO, John Graham Cummings, an old boss and friend of mine, he sent me an email.
I kept that email. It said, you've got to join this new company called Cloudflare.
They're doing some really cool firewall stuff.
And when I asked John, what does that mean? He's like, well, actually, we're trying to patch the Internet, which is a line that we used to use a lot of Cloudflare before it became, actually, we're trying to build a better, you know, help build a better Internet full stop.
Patching is a small scope.
But if you go back to that idea of patching, what is it that we're actually doing there?
Firewall rules, after all, have been around for a long time. I remember, you know, reading about ModSec and Apache.
And like, as soon as, as soon as there were web servers, there were firewall rules.
So what's some of the evolution that Cloudflare has gone through here?
As we've, you know, gone from literally providing, you know, Apache ModSec rules to like some of the some of the really advanced stuff we've got going on?
Yeah, great question. And when I was looking, early on, I was writing a document trying to position some of the strategy on the product side.
And I came across, I think it was a YouTube video by our CEO, Matthew.
And he was talking about, and I think Michelle was in as well, talking about building a firewall in the cloud.
And that was, you know, the earliest, I think, one of the mission statements of the company.
Yeah, there you go.
And so, you know, the form, as you mentioned, kind of early on, it was a lot of individual signatures matching against things, ModSec, you know, some of the technologies that were used.
We've been spending a lot of time trying to build a really sophisticated engine that I like to think of, like as a toolbox, where you can kind of plug in a bunch of different things.
And some of those things, you know, the engine itself, we've spent a lot of time improving from a performance perspective, because one of the things that, you know, we take very seriously is anytime we build a security solution, we don't want to have any impact on performance, right?
It's no good if we block a request, but every request, you know, takes five seconds to get through.
That's not a great user experience. And so, we spent a lot of time upgrading that engine, and we're using something called Rust now to rebuild that engine.
And Rust is really cool. And you can probably tell me more about why that's cool.
But it definitely is something that, you know, it gets a lot of great discussion going when we mentioned that.
Yeah, let's talk about that for a second.
You know, I think one of the things that's very appealing about, that appeals to a lot of engineers, and part of the reason they joined Cloudflare, it's because it's not scared of new technology.
When we see something that we think, actually, this tech could help us answer a problem that we've been, that we're trying to solve in a more efficient or smarter way, we can be some of the early adopters.
We were one of the first companies that really leaned into Go, which has now become a very popular system for programming language.
But the time that Cloudflare started was still pretty new.
And in the same way, I was in a meeting where a bunch of engineers said, yeah, I think this new firewall should be written in Rust.
And a part of me was like, that's really cool. And the other part of me, like the manager grown up part was, are you guys picking that because it's cool?
Or is there actually a good technical reason for why you're picking it?
And they're like, no, let's explain something to you. The challenge is we need to be able to make it easy for people to offer these rules and test them.
And that's in our control plane.
That's a completely different part of the Cloudflare stack than what happens at the edge.
And the edge was based on a different programming language called Lua.
So we wound up half of it Lua, half of it Go.
And that's where you wind up with the engine in which you're defining a rule and the engine in which you're executing that rule are different.
And that can lead to exactly the kind of problems that Cloudflare was trying to solve for people.
So the solution was, if we use the exact same library that is implemented in Rust, and we pick Rust because Rust is fast and it doesn't have memory leaks, which is the other thing we have to be very careful about here.
Because if we write in a language that gives us fantastic performance, but puts the onus on the programmer to make sure that there's no world in which you could ever make a mistake, then we open ourselves up to the possibility of other kinds of problems, which are, okay, fine, we have awesome performance, but we want to be careful that we don't land up in a world where we have security issues.
And so Rust gave us both.
It gave us that great performance, and it gives us that guarantee of memory safety, and we can run it in our control plane as well at the edge.
And that was exactly the genesis.
We're like, all right, we're going to be a Rust shop. And so that immediately continues integration, dev tools, like all the rest of the stuff that shows up in those chat channels inside Cloudflare where people are learning Rust, and that's all great.
I want to go back to something, Pat. So when people, when you talk about, let's just make this concrete for our viewers here.
What are the components of a firewall rule?
So when you want to define one, you know, we alluded to a second ago, the customers would call in and want to write a rule.
What does that mean? Like, you have to write a program to do that? Is that just a few, like, what are the inputs in the fields that go into a rule?
Sure. Yeah, that's a great question.
So the engine itself is fed, we've actually come up with a language for it.
And so we call, we kind of based it off of Wireshark, which is a tool that network administrators are probably familiar with.
I used to use it back when I did networking.
A long time ago. Yeah, it had some different names early on.
That's right. It used to be called Ethereal. Yeah, it used to be its name.
That's right. Yeah, showing our age here. So we would use that, and it would give you this really expressive syntax where you could go in and you could say, you know, I want to look at this TCP port, or, you know, part of the TCP packet.
In the same way, we can do that now. We can say, I want to look at the HTTP request headers or the body itself.
And so if you think about a web request, your browser's making, there's a couple different components of it, right?
So there's the target where it's sending it. What is the application? What is that host name?
What is that port? The path itself, you know, slash V1, V2, slash, you know, whatever it may be.
So there's kind of the basic, the request attributes there.
There's also the headers that are in the request, right? So your browser, you know, I'm sitting here using Chrome, I sometimes use Brave or Safari or other different browsers.
The browser's identifying itself and saying, this is, you know, who is making the request or what is making the request, as well as other different, you know, headers that give information.
And we didn't want our customers to have the right programs to match against this.
So they absolutely can if they want to use a worker to do really sophisticated things with the WAF.
But what they can do is simply use that expressive language that we've created to match those patterns.
I want to go back and mention one thing that I left out before.
The engine that we've built, we initially released that for customers to write their own rules, and we call firewall rules.
The team has the manage rule set, as I mentioned before, those rules are creating, they had built those in that old Lua technology and written actually originally by your boss, our CTO.
My boss, that's right.
Yeah, yeah. And so... The agency was here, was a joke on one of the slides, in fact.
Yeah, yeah. We have a nice counter of how many lines... Yeah, how many lines of CTO.
That's a good metric for all startups out there as you're growing.
How many lines of founder code are you deleting? That's a great method.
Exactly. And so, you know, what we're doing now is we're porting those rule sets and we're kind of taking the opportunity to reinvent what that user interface looks like.
If you think about, you know, Cloudflare going back a number of years, we largely were managing, you know, companies that would bring their entire domain to us and they would have, you know, a kind of a common rule set across that entire domain.
We now have really large customers managing many hundreds, if not thousands of applications.
And so we want to make it so you can say, I want to apply these particular rules for this set of applications and these rules for that set of applications.
And so we're making it a lot easier to do that as we bring those rule sets into that new engine.
And the reason that it's really important, you mentioned that sort of virtual patching.
A lot of the applications, you know, you might have hundreds of applications.
If you're a company today, you don't write all those applications, right?
You may use them.
I'm using WordPress or whatever, right? Yeah, exactly. Exactly. And so you, in a lot of cases, don't have the ability to go in, you know, as great a coder as you are.
I'm sure you can fix them all, but you don't have the ability to go in and do that.
And so if you can put Cloudflare, put a web application firewall between the browser and the attackers and the scripts and things like that, and those applications, wherever they may be, we can respond a lot faster.
And we've got, you know, 25 million plus properties using us.
And so it's inevitable that somebody is going to have the same application that you do.
And so we can deploy that and everybody can take advantage of everybody's protected.
Well, one of the things that I think has been really interesting, Pat, as you talk about that journey, like at the inception, when we first created the WAF, you know, you had kind of single customer, kind of single zone, single domain on us.
And now you've got customers who are managing really complicated portfolios and many of them managing, you know, lots of kind of different kind of lists of IPs and kind of managing a lot of complexity.
Can you talk a little bit about some of the things that the team has done and is doing to kind of simplify this level of management and this flexibility?
Sure. Yeah. And I want to key on one of the things you said, which is lists.
So we have customers that may maintain a list of IP addresses that they've previously had either problems with, or maybe lists of IPs that are monitoring their systems or whatever that may be.
And historically you could write these rules that say, you know, on a one -off basis, allow this, lock this, challenge this, et cetera.
But we didn't really have a data structure at the edge for people to maintain those lists.
And so what we're doing now is we're allowing a list to be created and we can store that and we can replicate that to all of our data centers around the world.
And the web application firewall runs on every machine and every data center, and they can maintain that list and then reference that.
So to give you an example, you might have a list of, you know, bad guys or something that you've seen that have been sending you bad traffic, and it might start out with, you know, a hundred items.
And over time, you might be adding items to that list.
We can manage that for you. And so we've given recently the ability to create custom lists.
One of the things I'm really excited about that we're going to build on that is something that we call managed lists.
And so those are lists where we take care of figuring out, you know, which IPs make sense to go in here.
And so if you're using Pingdom, for example, or Catchpoint or one of these monitoring tools, you might want to bypass a bunch of different rules.
And if we can maintain that for one customer, then another customer can kind of use those same rules.
And we only have to do it once and save the burden there. And so that's something that we've started with IP addresses, but we're also going to be letting people specify lists of countries and ASNs, which are, you know, network numbers that they'd like to take action on in some way.
And so that is a recent launch for us.
Sometimes when I'm explaining firewall rules to people who are not used to being web administrators, I use the analogy of email filters.
Because there's actually a fair amount of homology here, right?
Like it's the same kind of thing.
You've got this ocean of stuff coming at you and you want to automatically delete anything that's got, you know, some keyword in it or, you know, automatically label something with family if it's from your parents or your siblings or whatever.
And I think, you know, in the same way that you could start to see a world where, yeah, I know what I need to do, but it would be so tedious to create every single rule and list every single, you know, Muzaffar who has an email address.
Wouldn't it be great if I could just say, if the from is from one of my family members, and I just keep that list separately, then that can be, that makes the rule writing so much easier.
And, you know, and sort of the extended thing that you're sort of, the analogy can extend here is like that list might be useful to other people.
And so like this list of pingdom IP addresses or a list of common blocks or whatever becomes a library almost in and of itself.
Absolutely. And so there's those very basic lists.
Like this is a list that is relatively static and it changes a little bit over time and we're providing that convenience factor.
But then there's another set of lists that I'm really excited about, which are more kind of the threat intelligence based lists, right?
So, so lists of IPs that we've seen perhaps, you know, attacking other customers, or in one case, we're getting ready to give some access to something that is what's called a list of open proxies, right?
And so a proxy is something that you could run on your machine, or maybe you don't even know it's running and attackers might, you know, run that traffic through it.
And so it's just a place where bad guys can collect and do bad things.
Exactly. And so if we can scan the Internet and we can find those lists and we can assemble that for our customers, then they can reference that in a firewall rule.
And so the beautiful thing about lists is that they can be referenced in that, that wire shark-like language you mentioned.
And so you can combine a list with other intelligence in there.
So you can say, you know, is this coming from behind an open proxy?
And does it have a bot management score, you know, of, of, of less than 30?
And, and does it have a, you know, user agent that's, that's X or Y or whatever.
And so that's the power of putting all those things together.
And that's what, that you know, everything you add makes the existing things more powerful in that engine.
The other thing is like, it still looks like a wire, like anybody who's used wire, and let's be clear, like wire shark is to network administration, like a stethoscope is to healthcare.
Like it is a universal tool.
Everybody uses it. And, and, and so really what we're saying is that even when we use these advanced features, it still just looks like a wire shark filter.
Like, even if I've never logged into Cloudflare before, I just learned what Cloudflare was five minutes ago.
If you, if I, if you, if you take the, someone who's used to using, who understands how to read a wire shark filter and show them a firewall rule definition, they'll probably look at that and go, yeah, I get it.
I know what that's trying to do.
Yeah, absolutely. And as someone who used to have to administer, you know, a whole bunch of different systems, it's nice to have kind of a single system, regardless of where my application is.
I used to have to manage, you know, a Cisco firewall, a checkpoint firewall, a few other different things.
And I was always in my mind trying to switch back and forth.
It's like switching programming languages.
You're like, yeah, how do I catch exceptions here? How do I do this?
You know, what is the logging way to do it? And so that's, that's the kind of the spirit of, of us.
We try to introduce things in that way to make that easy.
It's awesome. Yeah. Well, and the thing I also really like is again, it leverages wire shark, but back to your sort of email filter analogy is fine.
I mean, part of what we do when we build product is, you know, we try to focus on making it accessible for, for everyone.
And so a big part of what the team did with firewall rules is build a really intuitive UI.
And so really you can, you can, you can get in there and you can use, you know, if you know your wire shark, you can use your wire shark, but you've also got a really intuitive kind of click and point interface that pulls in all of these different, what I would call them as like primitives of like the list and stuff right there at your fingertips.
So you don't actually have to know the language itself in order to get up and running, which I think is, is critical for, for especially people who are just getting started or for teams that are in the process of scaling.
Yeah. And that's really hard as someone who's tried to build something like before, to be able to build something where you can go back and forth, right.
Usually you can kind of click and, and add some Boolean logic and it'll create the expression.
But then if you edit that expression, you really can't go back.
So I was, yeah, that's a, and so I personally, when I'm learning it, I'll use that interface and then, okay.
Once I get accustomed to it, I'll switch and you know, you're a little bit quicker there.
I've talked a lot about the matching part.
Like it's, you know, again, if you think of a rule as an, if that, if this, then that kind of construct, what is the, that like, what are the legal things you can tell Cloudflare to do and how has that evolved over time?
Sure. So we, we think about separating a match on one side, all of those parameters you mentioned, then like an action on the other.
And so if that matches, what sort of action do you take?
And, and the most, you know, the easiest thing to understand is simply just block that serve, serve an error page saying, you know, you're not getting in here.
You sent something bad. Right. And, and, you know, we have a default page for that, or you can kind of customize and brand that and see what that looks like.
But then there's, there's other perhaps, you know, less drastic approaches that you might want to take.
And so maybe it's the case that you are reasonably confident that this is a bad request, but you're just trying to block, you know, random scans on the Internet.
And you want to make sure that this is actually being sent by a legitimate, you know, person and, and somebody that's actually sitting there behind the keyboard.
And so you could elect to serve what we call a CAPTCHA, which is a, you know, a test to try to determine is this actual human?
And everyone knows, you know, clicking on those boxes and trying to identify the bicycles or sidewalks or whatever.
Yeah. Yeah.
And so that, that would be one other, you could also redirect to a different origin.
And so we, we recently released something. So we have multiple teams working on this engine, which I think is really cool.
And so we have our, our FL team in London actually gave a new functionality here called URL rewriting.
And so this is kind of in test mode.
Now we're going to release this quite, quite soon where that request might come in and you say, you know what, I know this is a bot.
I want to send them to a different page rather than I would send a legitimate human to, and you want to serve maybe some bogus data back, or you want to maybe change the price.
You think they're trying to scrape your page. And so you can do a whole bunch of different things.
And so just like we're building up that, that matching capability, we're also building up the, the action capability.
And so rewriting, actually running a worker, if you detect a certain thing you know, redirecting to a honeypot, slowing the response down, there's a whole bunch of actions that we're going to put in your tool.
This is all headache that used to have to live on the origin and is now coming at in exactly the way the customer wants it at the cloud for edge.
And it ties in, you know, it ties into the whole vision of serverless.
It really is. It's so powerful and it's just simplifies everything that the origin has to worry about.
Absolutely. Yeah. I think one of the other things that I think has been really interesting about the work that the team has done is, you know, as we've talked about it, you know, it's a fairly complicated and sophisticated engine.
And it's often difficult for, for people as they're looking at this to understand what's going on.
You know, I think one of the most powerful things the team has done is really the robust analytics that they've put on top of it.
Pat, can you talk a little bit about kind of the journey around those analytics and kind of what you were, what you were trying to solve and kind of where we're at with that now?
Sure. So when I joined Cloudflare number of years ago, everyone was kind of writing their own analytics engine for, for individual products.
And you would, you would go from, you know, one zone -based view to, you know, a different product view and you'd have these kinds of very disparate interfaces.
And it would be tough to sort of reconcile between them. And so we, we worked very closely with the data and analytics team to try to standardize and streamline what these, these analytics look like.
And so if you're today going in and you're looking at, at the firewall, what we call firewall events, and then you're going to go look at cache analytics, you know, you, your proficiency that you're building up is, is useful on those other products.
And so we spent a lot of time working with the design team, as well as the front end team to implement this.
And, and there's, there's a single kind of interface now that gives a vantage point into what are all the things that, you know, we've blocked for you or challenged for you or done this or done that, where are you getting attacked from, where are the IPs, where are the top, you know, user agents.
So, and so we provide a whole bunch of information now that you can go in and see more or less in real time.
And you can see, you know, what is actually happening on your, on your domain, for example.
You can kind of slice and dice it however you want.
There's a filter component. And so you can actually, you know, drill into the particular data.
And then one thing that is somewhat recent is after you've drilled into that data, what do you want to do with that?
So you might want to actually feedback that, that back into a rule.
And so we built a feedback loop there to say, okay, I know this is, you know, problematic.
I want to, maybe I was challenging it before, but I want to outright block it.
And so we've kind of tightened that feedback loop.
The other thing that's really cool about, I think that's one of the coolest features, you know, for a long time, the mission used to be actionable analytics, actionable analytics.
Like it's not enough to show a report, like make it clear what you're supposed to do about it.
Not only do we make it clear, we put the UX importance on the graph.
It's so it's, it's such a tight feedback loop, like this thing, I want this thing to go away, or I want more, you know, I want to redirect these to, like you said, to whatever actions are available.
It's, it's, it's really nice. And even before you actually apply the rule, you can preview what it would do.
And so I know as somebody that, you know, used to log into routers and make changes, that sinking feeling when, you know, you enter a command and then the terminal doesn't respond, right.
You've shut down the interface or you've done something.
What we've done is something similar where you can actually write a proposed rule and then say, you know, show me what this would have done historically to traffic based on, you know, looking back over a number of days, typically a 30 day period.
And you can see, giving, giving yourself confidence before you do it.
The other really cool thing that we did with the analytics engine is we built it on top of GraphQL, which is a, you know, technology from Facebook, makes it really easy to write expressions, to pull data back from a number of different sources.
And, and what it's allowed people to do is, you know, we've built a, what we think is a really great user experience and interface on top of it.
But by no means do you have to use that, right? You could write your own UI if, you know, if you're bored one weekend and you want to build up your own analytics.
Yeah, no, it's, it's interesting how many times I think Jen, we've done five of these and I think four or five times GraphQL keeps coming up.
It's not just that it unlocks so many opportunities for our customers.
It's the thing that unlocked the analytics revolution inside Cloudflare.
That's how we were able to build all those, all those cool interfaces was this, this fantastic world where the client engineers can literally tell the server, give me this information and put it in this format.
Like, I don't need to, I don't need to, I don't need to buy cupcakes for the server team.
I can tell the server, I want this, this, exactly this combination of fields, this information, package it up in this structure and send it back to me.
It's, it's really great.
Well, and also that standardization and the kind of the components that we built on top of it, it's just accelerated our innovation, right?
I mean, like one team starts doing one thing on analytics, like zoomable analytics.
One team's like, I'm going to make my analytics zoomable. And another team's like, I'll take that now too.
And all of a sudden, boom, we've been zoomified across the entire dashboard.
Pat once, I think you even tweeted this, Pat, you said like one of the cool things about being a product manager at this time of its evolution is that there's so many great components that we can all use, you know, like that bots team is also using, is putting signal.
That's another field in the action for the firewall, you know, deal with rate limiting.
So like it's the, because it's a great extensible framework, you know, we've been able to pour all these great new features on top of it.
Yeah. I mean, it's a, it's a really fun time to be a product manager here.
You don't have to, you can, you can go in and kind of raid what other teams have built and package it together and add some additional stuff on top of it and really deliver, you know, a lot of value.
And I think one of the areas where we're, we're focused on right now which we've been talking about web applications, right?
Web applications are powered by APIs, right? Yeah.
And so, so, you know, on the backend that's a request that is, is supposed to be coming from some sort of automated system, right?
So it might be, you know, your mobile application, it might be an IOT device.
You've got a lot of customers using some, some really cool IOT use cases.
And so what we've really been focused on and going deep on is protecting that API traffic.
And so making sure that, you know, only legitimate IOT devices, for example, are connecting or only legitimate mobile applications.
And so one way to do that is what we did before on the server side is what we did something called universal SSL, which is we generate a certificate for the server and the browser connects and the server says, you know, I'm, I'm really this domain and here's how I can prove it.
We're doing the same thing now on the client side.
So the server is saying, Hey, hey, client, prove to me that you are who you say you are.
An Internet of things device that should be talking to my server and not just some random person.
Yeah. How do I know you're my refrigerator, you know, versus my neighbor's refrigerator.
And so we're, we're, you know, we've got some really cool stuff in the works there to, to help secure that and make sure, you know, using that same cryptography that it's going to be protected.
The other thing we're doing is we're kind of flipping the model on its head and saying how do we make sure that not just what's bad, but what's good.
And so being able to tell us, Hey, this is, this looks exactly like, these are the parameters and structure of a good request only let that stuff in and not others.
And so I know we're kind of closing in on our time here, but I could, I could go on and on and talk.
I know you could like, as I was joking with Jen, like, yeah, we need to get Pat on for multiple sessions for sure.
There's a ton to talk about here. And it's been, it's just been great to have you on Pat, just to get kind of a quick update on some of that.
And we'll definitely have you back to drill in, especially as, as some of these other things kind of, kind of hit the, hit the market.
But thanks so much for coming and spending time with us on a Friday afternoon.
Thank you Usman for, for another pleasurable Zoom afternoon over in front of a big virtual You gather around my lava lamps.
Excellent. Thanks again, Pat.
It was great talking to you. All right. Thanks so much.