Latest from Product and Engineering
Presented by: Jen Taylor, Usman Muzaffar, Patrick Donahue
Originally aired on June 4, 2021 @ 11:00 PM - 11:30 PM EDT
Join Cloudflare Head of Product Jen Taylor, Head of Engineering Usman Muzaffar, and Director of Product Management Patrick Donahue for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
English
Product
Transcript (Beta)
Hi, I'm Jen Taylor, Chief Product Officer at Cloudflare and I am thrilled to welcome you to another installment of Latest from Product and Engineering.
Awesome, thanks Jen.
I'm Usman, Head of Engineering at Cloudflare and we have a very special guest this time, one of our favorite teammates.
Pat, why don't you say hi? Great to be here, long time listener, first time caller.
Thanks for having me on the program.
Patrick Donahue on the product management team focused on security products.
Excellent. We wouldn't be able to do that without cracking up, but it's awesome.
So great. I think part of the reason we wanted Pat here is to talk about security products.
Jen, what are some of the first questions we want to ask Pat?
Yeah, so Pat, one of the things I've been spending a lot of time thinking about over the course of the past couple of weeks is what we've been doing with our web application firewall or WAF.
I've seen a lot of stuff come off the factory floor for that.
And I'd love to just understand, stepping back, what is a web application firewall?
Sure. That's a great question. So a web application firewall, it's a piece of software that sits between a browser and our customer's applications.
 And what it tries to do is it looks at those requests and that traffic and tries to identify requests that are potentially bad and potentially dangerous for the application.
And so as those requests come from the browser, it'll break it down into different parts and look at that and try to identify, is this a malicious request coming in that we want to not let through something that maybe is a specific hole in the particular application that the request is headed towards?
Or maybe it just looks generically like something that is bad that we want to block.
And so it sits between those applications, wherever they may be, whether they're in the cloud, SaaS applications, or potentially on our customer servers.
Now, how does that fit in with DDoS and rate limiting?
When we talk about our security solutions, we often talk about them as a suite or a host of solutions.
How does that work with these other, how does the WAF work with these other things?
Yeah, no, absolutely. So we like to think of the web application firewall as kind of the tentpole that holds up all these other solutions.
And so it's where you go in the Cloudflare dashboard and the Cloudflare interface to configure all those other security products.
And so we've got a couple of different parts of it. So we have what we call the managed rule sets, which are the rules that our security researchers are writing and researching and developing oftentimes, immediately as soon as something's announced.
And then we have what we call the custom rules in there, which are the tools that we give people to write their own rules and craft their own rules.
It used to be that you'd have to go and fill out a form and wait a little bit. I saw them in an old mockup the other day and I was like, whoa, where did that come from?
And so it used to be you'd have to wait, but our customers, obviously they want to be able to kind of serve stuff themselves without anybody having to jump in.
And so we've given all these different tools and we're packaging those together in what we call firewall rules.
And so you could write a rule that says don't let something above a certain number of requests per minute, for example, come in.
 And so that would be a rate limiting rule, but all of these kind of layer seven application layer security tools together form our web application firewall.
You know, when I first was invited to join Cloudflare by our CTO, John Graham Cummings, an old boss and friend of mine, he sent me an email.
I kept that email and said, you've got to join this new company called Cloudflare.
They're doing some really cool firewall stuff.
And when I asked John, what does that mean? He's like, well, actually we're trying to patch the Internet, which is a line that we used to use a lot of Cloudflare before it became, actually we're trying to help build a better Internet, full stop.
Patching is a small scope. But if you go back to that idea of patching, what is it that we're actually doing there?
Firewall rules, after all, have been around for a long time.
I remember reading about ModSec and Apache and as soon as there were web servers, there were firewall rules.
So what's some of the evolution that Cloudflare has gone through here as we've gone from literally providing Apache ModSec rules to some of the really advanced stuff we've got going on?
Yeah, great question. And when I was looking early on, I was writing a document trying to position some of the strategy on the product side.
And I came across, I think it was a YouTube video by our CEO, Matthew.
And he was talking about, and I think Michelle was in it as well, talking about building a firewall in the cloud.
And that was the earliest, I think, one of the mission statements of the company.
There it is, yeah. Funny to, yeah, there you go.
And so the form, as you mentioned, early on, it was a lot of individual signatures, patching against things, ModSec, some of the technologies that were used.
We've been spending a lot of time trying to build a really sophisticated engine that I like to think of as a toolbox where you can kind of plug in a bunch of different things.
And some of those things, the engine itself, we've spent a lot of time improving from a performance perspective, because one of the things that we take very seriously is anytime we build a security solution, we don't want to have any impact on performance, right?
It's no good if we block a request, but every request takes five seconds to get through.
That's not a great user experience.
And so we spent a lot of time upgrading that engine, and we're using something called Rust now to rebuild that engine.
And Rust is really cool, and you can probably tell me more about why that's cool, but it definitely is something that gets a lot of great discussion going when we mention that.
Yeah, let's talk about that for a second.
I think one of the things that's very appealing about, that appeals to a lot of engineers, and part of the reason they joined Cloudflare, is because it's not scared of new technology.
When you see something, they would think, actually, this tech could help us answer a problem that we're trying to solve in a more efficient or smarter way.
We can be some of the early adopters.
We were one of the first companies that really leaned into Go, which has now become a very popular system programming language.
But the time that Cloudflare started, it was still pretty new.
And in the same way, I was in a meeting where a bunch of engineers said, yeah, I think this new firewall should be written in Rust.
And a part of me was like, that's really cool. And the other part of me, like the manager grown-up part was, are you guys picking that because it's cool?
Or is there actually a good technical reason for why you're picking it?
And they're like, no, let's explain something to you. The challenge is, we need to be able to make it easy for people to offer these rules and test them.
And that's in our control plane.
That's a completely different part of the Cloudflare stack than what happens at the Edge.
And the Edge was based on a different programming language called Lua.
So we wound up with half of it Lua, half of it Go.
And that's where you wind up with the engine in which you're defining a rule and the engine in which you're executing that rule are different.
And that can lead to exactly the kind of problems that Cloudflare was trying to solve for people.
So the solution was, if we use the exact same library that is implemented in Rust, and we pick Rust because Rust is fast and it doesn't have memory leaks, which is the other thing we have to be very careful about here.
 Because if we write in a language that gives us fantastic performance, but puts the onus on the programmer to make sure that there's no world in which you could ever make a mistake, then we open ourselves up to the possibility of other kinds of problems, which are, okay, fine, we have awesome performance, but we want to be careful that we don't land up in a world where we have security issues.
And so Rust gave us both.
It gave us that great performance, and it gives us that guarantee of memory safety, and we can run it in our control plane as well at the Edge.
And that was exactly the genesis. We're like, all right, we're going to be a Rust shop.
And so that immediately continues integration, dev tools, all the rest of the stuff that shows up, chat channels inside Cloudflare where people are learning Rust, and that's all great.
I want to go back to something, Pat. So when you talk about, let's just make this concrete for our viewers here.
What are the components of a firewall rule?
So when you want to define one, we alluded to a second ago, the customers would call in and want to write a rule.
What does that mean?
You have to write a program to do that? Is that just what are the inputs in the fields that go into a rule?
Sure. Yeah, that's a great question. So the engine itself is fed.
We've actually come up with a language for it. And so we kind of based it off of Wireshark, which is a tool that network administrators are probably familiar with.
I used to use it back when I did networking. You did too, a long time ago.
Yeah. It had some different names early on. That's right. You used to call it ethereal.
Yeah, exactly. Yeah. Showing our age here. So we would use that and it would give you this really expressive syntax where you could go in and you could say, I want to look at this TCP port or part of the TCP packet.
In the same way, we can do that now.
We can say, I want to look at the HTTP request headers or the body itself.
And so if you think about a web request your browser's a couple of different components of it.
So there's the target where it's sending it. What is the application?
What is that host name? What is that port? The path itself, slash V1, V2, slash whatever it may be.
So there's kind of the basic, the request attributes there.
There's also the headers that are in the request. So your browser, I'm sitting here using Chrome.
I sometimes use Brave or Safari or other different browsers.
The browser's identifying itself and saying, this is who is making the request or what is making the request, as well as other different headers that give information.
And we didn't want our customers to have the right programs to match against this.
So they absolutely can if they want to use a worker to do really sophisticated things with the WAF.
But what they can do is simply use that expressive language that we've created to match those patterns.
I want to go back and mention one thing that I left out before.
The engine that we've built, we initially released that for customers to write their own rules and we call firewall rules.
The team has the manage rule set, as I mentioned before, those rules are creating.
They had built those in that old Lua technology and written actually originally by your boss, our CTO.
My boss, that's right. ABC was here, was a joke on one of the slides, in fact.
Yeah, we have a nice counter of how many lines of CTO. That's a good metric for all startups out there as you're growing.
How many lines of founder code are you deleting?
That's a great method. Exactly. And so what we're doing now is we're porting those rule sets and we're kind of taking the opportunity to reinvent what that user interface looks like.
If you think about Cloudflare going back a number of years, we largely were managing companies that would bring their entire domain to us and they would have a common rule set across that entire domain.
 We now have really large customers managing many hundreds, if not thousands of applications.
And so we want to make it so you can say, I want to apply these particular rules for this set of applications and these rules for that set of applications.
And so we're making it a lot easier to do that as we bring those rule sets into that new engine.
And the reason that it's really important, you mentioned that sort of virtual patching.
A lot of the applications, you might have hundreds of applications.
If you're a company today, you don't write all those applications, right?
You may use them. I'm using WordPress or whatever, right? Yeah, exactly.
Exactly. And so you, in a lot of cases, don't have the ability to go in as great a coder as you are.
I'm sure you can fix them all, but you don't have the ability to go in and do that.
And so if you can put a web application firewall between the browser and the attackers and the scripts and things like that, and those applications, wherever they may be, we can respond a lot faster.
And we've got 25 million plus properties using us.
And so it's inevitable that somebody is going to have the same application that you do.
And so we can deploy that and everybody can take advantage of those attacks.
Everybody's protected. Well, one of the things that I think has been really interesting, Pat, as you talk about that journey, like at the inception, when we first created the WAF, you had kind of single customer, kind of single zone, single domain on us.
And now you've got customers who are managing really complicated portfolios and many of them managing lots of different lists of IPs and managing a lot of complexity.
Can you talk a little bit about some of the things that the team has done and is doing to kind of simplify this level of management and this flexibility?
Sure. Yeah. And I want to key on one of the things you said, which is lists.
So we have customers that may maintain a list of IP addresses that they've previously had either problems with, or maybe lists of IPs that are monitoring their systems or whatever that may be.
And historically, you could write these rules that say, on a one-off basis, allow this, block this, challenge this, et cetera.
But we didn't really have a data structure at the edge for people to maintain those lists.
And so what we're doing now is we're allowing a list to be created and we can store that and we can replicate that to all of our data centers around the world.
And the web application firewall runs on every machine and every data center, and they can maintain that list and then reference that.
So to give you an example, you might have a list of bad guys or something that you've seen that have been sending you bad traffic, and it might start out with 100 items.
And over time, you might be adding items to that list.
We can manage that for you. And so we've given recently the ability to create custom lists.
One of the things I'm really excited about that we're going to build on that is something that we call managed lists.
And so those are lists where we take care of figuring out which IPs make sense to go in here.
And so if you're using Pingdom, for example, or Catchpoint or one of these monitoring tools, you might want to bypass a bunch of different rules.
And if we can maintain that for one customer, then another customer can kind of use those same rules.
And we only have to do it once and save the burden there.
And so that's something that we've started with IP addresses, but we're also going to be letting people specify lists of countries and ASNs, which are network numbers that they'd like to take action on in some way.
And so that is a recent launch for us. Sometimes when I'm explaining firewall rules to people who are not used to being web administrators, I use the analogy of email filters.
Because there's actually a fair amount of homology here, right?
It's the same kind of thing. You've got this ocean of stuff coming at you, and you want to automatically delete anything that's got some keyword in it or automatically label something with family if it's from your parents or your siblings or whatever.
And I think in the same way that you could start to see a world where, yeah, I know what I need to do, but it would be so tedious to create every single rule and list every single Muzaffar who has an email address.
Wouldn't it be great if I could just say, if the from is from one of my family members, and I just keep that list separately, then that makes the rule writing so much easier.
And the extended thing that the analogy can extend here is that list might be useful to other people.
And so this list of kingdom IP addresses or a list of common blocks or whatever becomes a library almost in and of itself.
Absolutely.
And so there's those very basic lists. This is a list that is relatively static, and it changes a little bit over time, and we're providing that convenience factor.
But then there's another set of lists that I'm really excited about, which are more the threat intelligence-based lists.
So lists of IPs that we've seen perhaps attacking other customers, or in one case, we're getting ready to give some access to something that is what's called a list of open proxies.
And so a proxy is something that you could run on your machine, or maybe you don't even know it's running, and attackers might run that traffic through it.
It's just a place where bad guys can collect and do bad things. Exactly. And so if we can scan the Internet, and we can find those lists, and we can assemble that for our customers, then they can reference that in a firewall rule.
And so the beautiful thing about lists is that they can be referenced in that wire shark-like language you mentioned.
And so you can combine a list with other intelligence in there.
So you can say, is this coming from behind an open proxy? And does it have a bot management score of less than 30?
And does it have a user agent that's X or Y or whatever?
And so that's the power of putting all those things together. And that's what everything you add makes the existing things more powerful in that engine.
The other thing is- It still looks like a wire. Anybody who's used wire shark, and let's be clear, wire shark is to network administration like a stethoscope is to healthcare.
It is a universal tool. Everybody uses it. And so really what we're saying is that even when we use these advanced features, it still just looks like a wire shark filter.
Even if I've never logged into Cloudflare before, I just learned what Cloudflare was five minutes ago.
If you take someone who understands how to read a wire shark filter and show them a firewall rule definition, they'll probably look at that and go, yeah, I get it.
I know what that's trying to do.
Yeah, absolutely. And as someone who used to have to administer a whole bunch of different systems, it's nice to have a single system regardless of where my application is.
I used to have to manage a Cisco firewall, a Checkpoint firewall, a few other different things.
And I was always in my mind trying to switch back and forth.
It's like switching programming languages. You're like, yeah, how do I catch exceptions here?
How do I do this? What is the logging way to do it? And so that's the spirit of us.
We try to introduce things in that way to make that easy.
That's awesome. Yeah. Well, and the thing I also really like is, again, it leverages wire shark, but back to your email filter analogy, it was fun.
Part of what we do when we build product is we try to focus on making it accessible for everyone.
And so a big part of what the team did with firewall rules is build a really intuitive UI.
And so really, you can get in there and you can use your wire shark, but you've also got a really intuitive click and point interface that pulls in all of these different, what I would call them as primitives of the list and stuff, right there at your fingertips.
So you don't actually have to know the language itself in order to get up and running, which I think is critical for especially people who are just getting started or for teams that are in the process of scaling.
Yeah. And that's really hard as someone who's tried to build something like that before, to be able to build something where you can go back and forth, right?
Usually you can kind of click and add some Boolean logic and it'll create the expression.
But then if you edit that expression, you really can't go back.
Yeah. And so I personally, when I'm learning it, I'll use that interface and then, okay, once I get accustomed to it, I'll switch and you're a little bit quicker there.
I've talked a lot about the matching part. Again, if you think of a rule as an if this, then that kind of construct, what is the that?
Like what are the legal things you can tell Cloudflare to do and how has that evolved over time?
Sure. So we think about separating a match on one side, all of those parameters you mentioned, then like an action on the other.
And so if that matches, what sort of action do you take?
And the easiest thing to understand is simply just block that traffic.
Serve an error page saying, uh-uh, you're not getting in here.
You sent something bad. And we have a default page for that, or you can kind of customize and brand that and see what that looks like.
But then there's other, perhaps less drastic approaches that you might want to take.
 And so maybe it's the case that you are reasonably confident this is a bad request, but you're just trying to block random scans on the Internet.
And you want to make sure that this is actually being sent by a legitimate person and somebody that's actually sitting there behind the keyboard.
And so you could elect to serve what we call a CAPTCHA, which is a test to try to determine, is this actual human?
 And everyone knows, clicking on those boxes and trying to identify the bicycles or sidewalks or whatever.
Yeah. And so that would be one other. You could also redirect to a different origin.
And so we recently released something. So we have multiple teams working on this engine, which I think is really cool.
And so we have our FL team in London actually gave a new functionality here called URL rewriting.
And so this is kind of in test mode now. We're going to release this quite soon where that request might come in and you say, you know what, I know this is a bot.
I want to send them to a different page rather than I would send a legitimate human to.
And you want to serve maybe some bogus data back, or you want to maybe change the price.
You think they're trying to scrape your page. And so you can do a whole bunch of different things.
And so just like we're building up that matching capability, we're also building up the action capability.
And so rewriting, actually running a worker, if you detect a certain thing, redirecting to a honeypot, slowing the response down.
There's a whole bunch of actions that we're going to put in your toolkit.
This is all headache that used to have to live on the origin and is now coming in exactly the way the customer wants it at the cloud for edge.
And it ties into the whole vision of serverless. It really is so powerful, and it just simplifies everything that the origin has to worry about.
Absolutely, yeah. I think one of the other things that I think has been really interesting about the work that the team has done is, as we've talked about it, it's a fairly complicated and sophisticated engine.
And it's often difficult for people as they're looking at this to understand what's going on.
I think one of the most powerful things the team has done is really the robust analytics that they've put on top of it.
Pat, can you talk a little bit about the journey around those analytics and what you were trying to solve and where we're at with that now?
Sure. So when I joined Cloudflare a number of years ago, everyone was writing their own analytics engine for individual products.
And you would go from one zone-based view to a different product view, and you'd have these very disparate interfaces.
And it would be tough to reconcile between them. And so we worked very closely with the data and analytics team to try to standardize and streamline what these analytics look like.
And so if you're today going in and you're looking at the firewall, what we call firewall events, and then you're going to go look at cache analytics, your proficiency that you're building up is useful in those other products.
And so we spent a lot of time working with the design team as well as the front-end to implement this.
And there's a single interface now that gives a vantage point into what are all the things that we've blocked for you or challenged for you or done this or done that, where are you getting attacked from, where are the IPs, where are the top user agents.
And so we provide a whole bunch of information now that you can go in and see more or less in real time.
And you can see what is actually happening on your domain, for example.
You can slice and dice it however you want.
There's a filter component. And so you can actually drill into the particular data.
And then one thing that is somewhat recent is after you've drilled into that data, what do you want to do with that?
So you might want to actually feedback that back into a rule.
And so we built a feedback loop there to say, okay, I know this is problematic.
Maybe I was challenging it before, but I want to outright block it.
And so we've kind of tightened that feedback loop. The other thing that's really cool about- That's so great, by the way.
I think that's the coolest features.
For a long time, the mission used to be actionable analytics, actionable analytics.
It's not enough to show a report. Make it clear what you're supposed to do about it.
Not only do we make it clear, we put the UX importance on the graph.
It's such a tight feedback loop. Like this thing, I want this thing to go away.
Or I want to redirect these, like you said, to whatever actions are available.
It's really nice. And even before you actually apply the rule, you can preview what it would do.
And so I know as somebody that used to log into routers and make changes, that sinking feeling when you enter a command and then the terminal doesn't respond, right?
You've shut down the interface or you've done something.
What we've done is something similar where you can actually write a proposed rule and then say, show me what this would have done historically to traffic based on looking back over a number of days, typically a 30-day period.
And you can see giving yourself confidence before you do it.
The other really cool thing that we did with that analytics engine is we built it on top of GraphQL, which is a technology from Facebook, makes it really easy to write expressions, to pull data back from a number of different sources.
And what it's allowed people to do is we've built what we think is a really great user experience and interface on top of it.
But by no means do you have to use that, right? You could write your own UI if you're bored one weekend and you want to build up your own analytics.
Because you're that good of a coder, you might just do it.
Maybe, maybe. Anyway. Yeah, no, it's interesting how many times I think, Jen, we've done five of these and I think four out of five times GraphQL keeps coming up.
It's not just that it unlocks so many opportunities for our customers.
It's the thing that unlocked the analytics revolution inside Cloudflare.
That's how we were able to build all those cool interfaces was this fantastic world where the client engineers can literally tell the server, give me this information and put it in this format.
I don't need to buy cupcakes for the server team. I can tell the server, I want exactly this combination of fields, this information, package it up in this structure and send it back to me.
It's really great. And also that standardization and the components that we built on top of it has just accelerated our innovation.
One team starts doing one thing on analytics, like zoomable analytics.
One team's like, I'm going to make my analytics zoomable. And another team's like, I'll take that now too.
And all of a sudden, boom, we've been zoomified across the entire dashboard.
Pat once, I think you even tweeted this, Pat, you said one of the cool things about being a product manager at this time of its evolution is that there's so many great components that we can all use.
That bots team is also using, is putting signal.
That's another field in the action for the firewall, did it with rate limiting.
So because it's a great extensible framework, we've been able to pour all these great new features on top of it.
Yeah. It's a really fun time to be a product manager here. You can go and raid what other teams have built and package it together and add some additional stuff on top of it and really deliver a lot of value.
And I think one of the areas where we're focused on right now, which we've been talking about web applications, right?
Web applications are powered by APIs, right? I was just about to ask about APIs.
Yeah. And so on the backend, that's a request that is supposed to be coming from some automated system, right?
So it might be your mobile application.
It might be an IOT device. You've got a lot of customers using some really cool IOT use cases.
And so what we've really been focused on and going deep on is protecting that API traffic.
And so making sure that only legitimate IOT devices, for example, are connecting or only legitimate mobile applications.
 And so one way to do that is what we did before on the server side is we did something called universal SSL, which is we generate a certificate for the server and the browser connects and the server says, I'm really this domain and here's how I can prove it.
We're doing the same thing on the client side. So the server is saying, hey client, prove to me that you are who you say you are.
An Internet of things device that should be talking to my server and not just some random person.
Yeah.
How do I know you're my refrigerator versus my neighbor's refrigerator? And so we've got some really cool stuff in the works there to help secure that and make sure using that same cryptography that it's going to be protected.
The other thing we're doing is we're kind of flipping the model on its head and saying, how do we make sure that not just what's bad, but what's good?
And so being able to tell us, hey, this is, this looks exactly like, these are the parameters and structure of a good request.
Only let that stuff in and not others. And so I know we're kind of closing in on our time here, but I could go on and on and talk.
I know you could.
And like, as I was joking with Jen, like, yeah, we need to get Pat on for multiple sessions for sure.
Definitely. There's a ton to talk about here. And it's been, it's just been great to have you on Pat, just to get kind of a quick update on some of that.
And we'll definitely have you back to drill in, especially as, as some of these other things kind of, kind of hit the, hit the market.
But thanks so much for coming and spending time with us on a Friday afternoon.
Thank you Usman for, for another pleasurable Zoom afternoon over in front of a big virtual Lava Lamp.
You gather around my Lava Lamps. Excellent. Thanks again, Pat. It was great talking to you.
All right. Thanks so much. Bye everybody. Bye.