Originally aired on February 11, 2021 @ 12:00 PM - 12:30 PM EDT
Join Cloudflare's Head of Product, Jen Taylor and Head of Engineering, Usman Muzaffar, for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
Hi, I'm Jen Taylor, Chief Product Officer at Cloudflare. Very excited to be here with you, Usman, doing another episode of Latest from Product and Engineering. Hi, Jen and I'm Usman Muzaffar, Cloudflare Head of Engineering. Yes, we have Michael and Andre joining us today. Andre, why don't you introduce yourself and then Michael, you say hi too. Sure. Yeah. Hi, I'm Andre. I am the Engineering Manager for the Managed Rules team here in London, despite my accent and San Francisco logo on my hat. And we are the team that owns what is externally known as the WAF, Web Application Firewall. Cool. And hello, everyone. My name is Michael and I'm the Product Manager for the Web Application Firewall. So I'm Andre's partner in crime to some extent. And I speak to customers and make sure we're building the right features for our WAF. Well, I appreciate you guys joining us live from the UK. We're totally taking it global once again. The sun never sets on the Cloudflare empire when it comes to building a phenomenal product. And you guys are a shining example of that. So thank you. So what's new? I mean, we just talked to you guys pretty recently. What's going on in WAF land? Yes. So we're super busy in a nutshell, but we're still working hard. I think we mentioned this really last time. The Web Application Firewall is here, is one of the core products of Cloudflare. And we're trying to make it really easy to protect our customers' applications. And over the last year or so, we've been focusing a lot of our efforts to do a new version of our WAF essentially with a lot of nice features and goodies that customers will be able to use. And over the last couple of months, we've actually been testing some of these novelties with some of our early access customers. And we're basically iterating on feedback, getting good insights in what things we can tweak to make it even better. And we're looking forward to start actually releasing some of these new features to the broader set of Cloudflare customers pretty soon. So that's what we've been working on for the most part, really. And there's a lot of other things, of course, that we're looking at building further down the line. But I guess, Andre, would you agree it's been hard work with the new engine so far? Absolutely. I mean, there's a reason it's taken a full year for us to build this. We started this... So I joined the Managed Rules team right about this time last year, and they had already started planning and started thinking about what kind of shape it would be, what it would look like, what kind of features it would have, and really the scope of it even before I joined. So a lot of the work that's happened in the last year has really come to fruition where we've started putting it in customers' hands and getting feedback and kind of iterating, already started iterating on what that will eventually look like for a more general availability launch. Let's start from the... What is an example of a problem, Michael, that was hard to do with the original? Because the Cloudflare WAF has been around for a very long time. It was one of the original... In fact, when... I think we mentioned on the show before that our CTO, John Graham-Cumming, an old boss of mine, and he was trying to convince me to join Cloudflare. And we've now validated this by looking back at old emails before he joined. So before he joined Cloudflare, he was telling me, go join this company. They're working on cool WAF firewall stuff. That was actually the exact email. So this was from way back. This was the cool firewall stuff was what Cloudflare was trying to do. So we've been doing this for a long time and quite successfully, I might add. So what was an example, Michael, of a problem that got us to send Andre all across the ocean to help with this new effort that was already underway? And it's just wrapping up here. What's an example of something that was hard that we needed to put all this gray matter on to try to make better? Make it better? Yeah, no, good point. First of all, the WAF isn't one of the core products, right? It's been around since the beginning of Cloudflare. It's been protecting customers from potential attacks from the beginning. One thing over the years, of course, we've had more and more customers use the web application firewall and find better and more complicated use cases for it. And as we get even bigger customers behind the platform, they've been seeking better ways to configure and fine tune the WAF for different use cases, not only for websites and your blog or your WordPress site, but also for API use cases, for example, for automated traffic endpoints. Or maybe some applications which are very old have been now onboarded onto the Cloud for WAF, and they need to be able to configure exceptions in the engine so that we don't block traffic that may actually look malicious, but is actually intended. And the only reason we might block it is because it's just very old and the application just needs to support some traffic that would otherwise be blocked by the WAF. So there are kind of two things there, right? One is customers have started demanding kind of greater and more fine grained control as they've started using our capabilities more. And then they just need legitimately more capabilities to match the additional surface area and innovation that has happened in the way that they build applications today. Yeah, correct. So those are two of the many problems or improvements we're trying to bring with the new WAF. Number one is ability for customers to fine tune and configure the WAF for specific traffic filters. So while historically the WAF, Cloudflare, anyone who's used Cloudflare knows it's its own base system. You can now actually split your traffic however you wish and basically fine tune the WAF for that portion of your traffic. And you can have different settings across your different endpoints, for example. Another common use case we're actually trying to solve, we want to make it still very, very easy to use. That's one of our core principles at Cloudflare. And although Cloudflare, you know, you just had a website and then you sort of configure each website individually, we actually have customers that want the same configuration across all of their websites and they want to keep it simple. That is something we couldn't easily do with the old system. You can still, you know, we have an API so you can program your same settings across your entire applications, but now it's literally down to a click. You can deploy a single configuration, for example, across your entire account. That is cool. So, you know, Andre, like, sounds pretty compelling. Like, why did it take a year? Yeah, that's a real good question. Okay, like a true product manager. That's my favorite start to a sentence, couldn't you just? Yeah, so there was a couple different moving pieces to this. One of it was our current engine, well, if you can call it an engine, the way that our WAF works currently is that we take a industry standard format called non-security and we convert it to a couple different layers of things. We first convert it to JSON and then we convert it to Lua and that Lua code is what actually runs on the edge, on our actual servers. And so, we didn't want to do that anymore. So, we decided to build this whole new engine in a language called Rust, which we've embraced very heavily here at Cloudflare generally. It's fantastic, it's very performance, it's memory safe. We have a lot of community involvement here at Cloudflare with Rust as well. And so, we're very invested in Rust. And so, we figured it would be a fantastic choice for this new engine. So, we had to build an entirely new engine from scratch that supported all the things that we previously needed to do and was at least no worse in performance, right? Like, you don't want to start off with a whole new engine that's performing worse. So, we spent, we actually had the engine complete as far as up to par with the same amount of features and things that you could do in the existing WAF. We had that done towards the end of last year. And what we've been doing since then is we've been enabling our customers to be able to get to that point. So, after we have the engine that's stable, then we need to put APIs in front of it and we need to put a UI on top of it. And so, that's really been kind of the last few months is putting the last finishing touches on it before we can give it to my customers. And it still has to let you write in ModSec, right? So, that classic old, the ModSec comes from the mod security for the real old school people in the audience, the Apache module that lets you and the regular expression rules that first showed up there. And the new engine is 100% compatible and had to be able to take that whole format and then run it, execute it at the edge. Well, so compatible, yes. But we don't actually execute or interpret ModSecurity anymore. The equivalent of it, right? Exactly, yeah. So, we have an open source library that's, it's a language called wirefilter and it's based on the Wireshark filtering language, hence the name. And that's also built in Rust. And so, we built our engine on top of that. So, that's the actual, the language, the DSL that it understands is wirefilter. And then in our engine, we take things from engine X, take the context variables and the things like headers and the body and things like that and put it into that engine so that it knows what to execute against. And what we did was, because we have so many WAF rules today and we want to continue to support things like OWASP, the core rule set of OWASP, which is written in ModSec, we want to continue to support them in the future. What we did is we built a converter. So, a converter to take ModSec format and convert it into wirefilter. And so, we've run that against all of our own rules. And then it gets executed by the new engine. What we also did was, there was a Cloudflare feature way back in the day that we have now converted to using firewall rules, but there was a feature where you could request that we write a rule for you. And it would run on your zone and your account. And so, we had many thousands of those rules. And so, what we've been doing is we've been running that converter and converting them and giving them to users because it's the exact same syntax that our existing firewall rules uses. It's just a different engine that powers it. And so, we were able to convert all of those rules for our customers and really give them that control. It's really remarkable what a long way we've come from the time when you literally had to write into support, and an engineer would individually handcraft a rule for you, and then test that rule carefully, and then push it as effectively part of a software release almost is how it would, the equivalent of that. And we're- Yeah, we had to do a laugh deploy to do that. Past beyond that, it's great how much control we've given for it. Yeah. I mean, that's just so antithetical to how we operate at Cloudflare, right? That like, put a human in the middle, right? That like, no, our main focus here is ease of use and empowering customers to be able to do these things for themselves. I mean, and Michael, I'm sure that was a big part of the consideration you had. Yeah, no. Making sure, because we do have still a couple of features which we can configure from our internal Cloudflare themes from our backend, but we really want to give all of that power to our customers. And I think the older mod security rules is the perfect example of that. Now there is a fully blown editor in the interface. Sure, you can still reach out to support and ask for help, but you can actually just write the rules on your own, test them. And actually when you deploy them, you can both actually test them before you deploy them, because we have the nifty little feature that allows you to run the rules on historical traffic. And then you can submit them in logging, what we call logging mode. So you can see how they would behave before they start blocking or challenging traffic. And that is fully self-service from the dashboard. We can help, but we want to empower users to do that, right? So another question I had is, so now you've got this whole new way that you can write rules. And, you know, Andrew, you alluded to the fact that they used to, they were in categories and sets as we were letting people try to turn on whole sets of them. But now rule set, Michael, is becoming an actual primitive, an actual thing that you can talk about. And with that, that pulls in a whole bunch of other just pure data management issues of how you can categorize rules, how you can collect them and turn them on and off and combine them in different sets. And now you've talked about how they're being managed at the account level, which is, you know, so they can easily be shared between zones. So talk a little bit about like some of the, what are some of the use cases that our customers were seeing? What are some of the things that you wanted customers to be able to do easily, you know, in that balance between ease and flexibility? How did we approach that? Yeah, no, that's a really, really, really good question. First of all, with the current engine, customers will write rules as individual objects, and they would write rules on every single zone and may have to duplicate those rules across their zone. From speaking to customers, though, we found that many, many times, actually, there's a single set of rules that can be considered as one configuration that they want to then deploy across their traffic. So that's why we add this concept of ruleset. And the ruleset is essentially just a group of rules tagged with a single name, so that customers can deploy a single ruleset across arbitrary traffic. But another really good feature we're adding with the new engine is actually the ability for these rulesets to be versioned. As we, you know, changing your WAF config has a very high risk, potentially, of breaking something or blocking traffic you don't want. We want to make it really easy to go back to your previous configuration without having to remember what you changed. So by combining the ruleset concept and the versioning, we're going to make it really easy, essentially, for customers to roll back to the prior configuration and or manage the rules from a higher level of degree without having to fine-tune every single rule individually. Because, for example, we actually provide two managed rulesets as part of the WAF. One is the OWASP security core ruleset, which is what Andrej mentioned earlier, plus an internal Cloudflare managed ruleset. We can now basically, well, any user can actually deploy one of those rulesets with a single override of log on the entire ruleset, making it really easy to test on your own traffic, something that before you would have to done on a single per rule level. So this abstraction is going to help us basically create a bunch of shortcuts to make it quicker and easier to test the WAF. Yeah, and another thing that we had in the current WAF is there's these groupings of rules. So there's Cloudflare specials, there's WordPress, there's Drupal, and there's OWASP. And those actually correspond to, you know, kind of showing how the sausage is made a little bit, those correspond to actual files that we have all of those rules grouped to. So it was actually a predetermined grouping of those rules, and it wasn't very flexible. And so what we did was in the new ruleset, we didn't want to need to fiddle with any of that. What we did was we added the ability to add a thing called a category to a rule, and it's a many-to -many relationship. Rules can be in many different categories, and categories can have a bunch of different rules in them. And what we did was we exposed that in the UI of, you can take this operation, you can, as Michael said, override the action. Say if you want to just take the action of log on this entire category, you can do that. And the categories are way more flexible now. So we can have categories of not only emulating what we had before of WordPress or Drupal or things like that, but we can also do PHP, or we can do a certain category of CVEs, or we can do a certain category of like an OWASP top 10, like XSS or something like that. It makes it way more flexible for us to express what these rules are doing, but also for the user to configure them. Yeah. It's funny. And just to share with folks a little bit of how the sausage is made here. We spent a lot of time really debating what the user experience should be here. Because it was making sure that you create a great deal of power and flexibility and tools around categorization, but it very quickly can just become this sticky fly paper that is almost impossible for users to kind of rock and parse through. How did the two of you guys work together on collaboration around really the evolution of the data model and the evolution of the user experience? I think me and Andrew probably have five or six meetings a day together. We basically make sure we keep in touch. No, but jokes aside, we have been iterating a lot. One thing we really care about here is giving the engine, even when we had the first version of it in the hands of customers, and we've been doing this for several months now, and iterating on that feedback is super, super important. I've actually put Andre in a few difficult spots a couple of times when we've had to rename some of our primitives in the engine to something that made more sense. But luckily, as long as we catch those things sufficiently early on, if it results in a much better user experience at the end of the day, it's worth doing. So I try to speak to customers as much as we can, get feedback from them, and then keep up to date both the engineering team, and we also work with product design. We work, we have actually in London, we're lucky, we have also a user experience team member that basically helps us throughout the process. But communication at the end of the day is core to make sure we're looping quickly on whatever customers are saying to us. Yeah. Yeah. And we also, it wasn't just our team. So we had a lot of help from other internal teams, the firewall team, my colleagues, Richard, and he really helped drive the design of this as well. So it wasn't just our team in isolation, not only from other teams, but from our customers. It was a very collaborative design to make sure this can scale. And we have bigger visions for this rule set engine, for what it can do, and helping other internal teams adopt because of this very expressive and consistent way of selecting traffic and building filters around traffic. It's awesome. One of the other things that I know we've worked on more recently is payload logging. So Michael, what's the problem with payload? What do we mean by that? And what is the, what is not so obvious about logging payloads and why do we have to be, why is there, there's a bit of a minefield there. You have to be paying attention. Yeah. And we briefly talked about this in the last session. And the nice thing here is that we built this feature quite a while ago. The problem we're trying to solve is whenever a web application firewall blocks some traffic, the first question that comes to mind is why was that traffic blocked? Right. What did you see that tripped the alarm? Right. Right. Exactly. And that's the first thing that, you know, people were writing to support was probably many, many tickets to support every day. Why, why was that traffic blocked? To answer that question, we can optionally, of course, log the entire request that was blocked. But the challenge with that is that sometimes we will be logging sensitive data. We could be logging passwords. We could be locking social security numbers, whatever the application is transmitting. If we block and log the entire request, there's a liability there with holding sensitive data. We, what we did to solve this problem is in December, actually, we announced encrypted payload logging, which essentially allows us to log the data, but encrypt it with a public key provided by our customer. So we at Cloudflare have no way of seeing what was blocked, only our customers have control of their own data. And that makes it a lot easier for them, of course, to debug and figure out why the request was blocked without necessarily allowing us, we basically cannot see the data, which is the best position for us to be in. Now we built that feature on the current WAF engine. However, for historical reasons, you know, because of the sensitivity of it, you need to contact the team and then we would enable it on behalf of our customers. The nice thing now with the new engine, going back to what Andrew was saying, you can specify your filter of traffic and deploy the rule set, and then customers can self-serve, turn on payload logging on that specific rule set, which basically allows them to be more specific on when to turn this additional logging on to, you know, whenever needed, essentially, which is a big plus compared to the de facto current status of it. Well, and again, it's just this kind of constant journey we're on, especially I think this product really kind of where it is and what the team is focusing on that really highlight this kind of this significant focus we have on ease of use and empowering users to be able to do things and just giving them that sort of flexibility. You know, I don't know about, you know, most customers I experience, experience, you know, security events and problems in the middle of the night and need that sort of flexibility to be able to immediately take action. So I've been really excited about that. But, you know, I think the other thing I think that's been really interesting is just really, I think, where you guys are with the team and how you're continuing to evolve the way that we conceptualize our role and basically how we're kind of continuing to build out and round out the team with some of the work we're doing around data science. Andre, have you been approaching sort of the opportunity and the challenge around data science? Yeah, so one of the things that, you know, as we've been investing more in really leveling up our WAF, one of the things that we wanted to focus on was becoming more proactive instead of being reactive. So for a long while, the WAF team was in a place where it needed to react to customers writing in and saying, hey, why did you block this thing? Why didn't you block this thing? Or responding to CVEs that were coming in. So we pay attention to various feeds about this is the newest alert or vulnerability or something like that. And we made a conscious decision that we wanted to invest more in being proactive for those kinds of things. So we have a ton of data that we can kind of send some data scientists on and say, you know, what are we missing? Or what is some new permutation of a thing that we already know exists? And so those are the kinds of things that we're doing right now is we're getting started with really understanding what is a request? What makes it good or bad? And can we determine not only is it binarily good or bad? I'm pretty sure I just made up a word there. But also what kind of good or bad is it? Is it an XSS? Is it a SQLI? Is it a credential stuffing some kind of attack? And really understanding and picking apart because we get those kinds of questions too, especially for things like the OWASP. So our OWASP rule set is kind of a shotgun approach of does it trip some kind of alarms? Okay, cool, we'll block it. But it's not really transparent about those kinds of things. And so learning that lesson, we want to give that ability to people to be able to figure out those kinds of things. So from a data science standpoint, one of the things that we're setting up is this is brand new on the team. We don't have any kind of pipeline for enabling actually collecting that data or making sure that we store it in a proper manner and in a format that our data analysts can get at it and actually operate on. So that's basically where we are right now is we're setting up those things. We're setting up the data pipeline and the data lake and the processing pipeline for all of that data. When you think ahead, how is it all going to come connected? How does that flow back into this? So what's the proactive connection here as we're envisioning connecting this kind of new kind of analysis to the signal where we're doing it? So actually, Michael, you can ask how will this feed into the product and how will it make the firewall even better? We're looking at quite a few really interesting projects actually. I'm really excited about the data science aspect of the web application firewall. And we've actually had a lot of learnings to be made by our other bot mitigation team that's been doing some similar techniques for quite some time now. But by applying various data science -related activities to the traffic that's flowing through the Cloudflare platform, we can basically probably detect new attack vectors without us having to write specific signatures or rules which would block them, which is what the current WAF does, right? That's the ultimate goal really is that's where the proactive part of it is, right? Having the ability to deploy a model that says, okay, I've seen something like this before, it's probably phishing. And we're wanting to actually also provide that potentially as an example, as a score to the customer, very similar to our bot mitigation product works today where we give our customer a score that says, we think this specific request is very likely to be a bot. That would be probably some way we expose some of the data that comes out of this is we think this request is very likely an XSS attack. Customers can then decide what their threshold is in terms of how aggressive they want to be or not. That's one really good use case we're looking for. And then we can do this for XSS attacks, for SQL attacks, for cross-site request forgery attacks, the whole spectrum. And we could also combine into a single store that then customers may make decisions on. I just, I love the journey that this team is on, right? I mean, we started the conversation by talking about the fact that like, the replacement of the engine is really about getting the people out of the way of having to sort of write in, write a ticket and then deploy a rule, right? And enabling customers to be able to do it itself. Kind of similar theme here where, how do we employ some intelligence on top of our huge corpus of data and use that as a way to kind of scalably identify things. So again, it's about getting the people out of the middle, rather than having data scientists themselves sort of manually combing over or waiting for a human to say, I've got a problem. I think we should look at this really kind of proactively looking at that data. It's a really cool moment to be in. Yeah, definitely. And to plug our new engine, it's all built on top of our new engine too. So back to my like snarky comment at the end, it's like, why'd you have to do that? Why did it take so long? Well, actually it turns out that it was not like, you know, it's not like you went in and like, you know, just had like added a new muffler, right? I mean, it's like, it is literally kind of like you went from sort of, you know, an engine that is, you know, uses gas to an engine that is electric, or an engine in a car to now one that is, you know, in a race car or in a plane, like it's, they're not like. And an engine that can be shared between different, all the different models of the factory. So it's, we could have another Cloudflare TV session on great metaphors. I'm always amazed, like how fast the time goes when we talk to these, to our colleagues about all these interesting products and technologies. Andre and Michael, thanks so much for joining us. And Jen. Absolutely. It was a lot of fun. And we'll see everybody next week on the latest from Product and Eng. Thanks so much. Thanks all. Awesome. Thanks guys. Bye. Bye.