1️⃣ Intrusion detection
Presented by: Annika Garbers, Chris Arges, Jordan Griege
Originally aired on August 4, 2023 @ 3:00 PM - 3:30 PM EDT
Join our product and engineering teams as they discuss what products have shipped today during Cloudflare One Week!
Read the blog posts:
Visit the Cloudflare One Week Hub for every announcement and CFTV episode — check back all week for more!
English
Transcript (Beta)
Hello, welcome to Cloudflare TV. My name is Annika. I'm on the product team here at Cloudflare and I'm so excited to be with you with Chris and Jordan to talk about Cloudflare One Week and specifically our intrusion detection capabilities that we have announced updates to as part of this week.
Cloudflare One Week is one of our signature innovation weeks where we announce new products, features, partnerships, enhancements to our service around a specific theme.
And this week we're talking about how Cloudflare One is bringing networking and security together and helping customers build really the next generation of their network architecture on ours.
So this segment is mirroring a blog post that we published today with updates on our IDS or intrusion detection system capabilities.
But before we talk about what we've built, let's back up for a second and figure out what is an IDS in the first place.
And I'd love to start with Chris. If you could introduce yourself, tell the audience a little bit about who you are, and then maybe help answer that question.
What is an IDS and why does someone need one? Sure. Thanks, Annika.
I'm Chris Arges. I'm the engineering manager for the Magic Firewall team.
And yeah, we've been looking quite a bit at IDSs or intrusion detection systems, and specifically looking at network intrusion detection systems.
So what is this?
These are systems that can inspect traffic against known signatures. So this is a good way for instead of just like a traditional firewall inspecting various characteristics and creating rules, we're looking for known characteristics, known signatures, known anomalies that can tell us that something bad is happening.
So this is a really powerful tool to detect these things in your network.
There's also another concept called IPS. So there's IDS, which is detections.
IPS stands for intrusion prevention systems. So essentially this is not only detecting these anomalous behaviors and bad signatures and giving you alerts, but also blocking that traffic proactively.
So we think that that's also a very powerful thing to look into.
Now, IDS can also mean a lot of things. There are lots of functionalities and features kind of wrapped up in that terminology.
So this could be looking for anomalous signatures.
Let's say somebody's trying to port scan your network.
Maybe there's some known malware that reaches out with a certain set of bytes or things like that.
But there can also be systems that look at hash signatures of files for malicious content.
There could also be various data loss prevention functionality too.
So IDS is a very broad set of characteristics. So we've been really trying to just focus in on the network intrusion detection system and specifically what can we do to match on various signatures?
Cool. So I'd love to ask Anika a little bit more about how do customers use IDS technology today and why are they looking at that functionality?
Yeah, sure. I love that you brought up how broad an IDS can be because I think that specific aspect of this space made the product research for building this functionality really interesting because when we were building the initial versions of Magic Firewall, which is our network firewall delivered as a service, it was pretty easy to get feedback from customers about what they were looking for.
We'd say, tell us about your firewall config today.
Show us your policies. What are you doing? And they would say something like, oh, well, we're implementing a positive security model.
And so we're blocking everything except for TCP on ports 80 and 443.
And then we are just enabling outbound traffic to websites on the Internet, for example.
But IDS, the requirements get a lot more complicated.
And a lot of times teams don't necessarily know what they need, but they just need protection from things that are out there on the Internet that are trying to target them.
And that includes both known signatures and types of attacks, which there are tons and tons of that are well-documented and out on the Internet and available as part of open source feeds.
But it could also be unknown things as well. People don't know what they don't know.
And it's hard to know what you need to place in order to protect your network at all of these different levels from different threat vectors.
So the customers that we talked to today that are hoping for a new way to accomplish IDS, which we'll talk about our approach here in a minute, what they're doing today is using either a dedicated hardware appliance or an IDS function that exists in one of their security appliances that they already have on-premise in a data center or an office.
That could be probably something like a firewall or next generation firewall that includes IDS capabilities.
But some of the challenges that customers brought up with these approaches, and part of the reason that they started talking to us about, could you do this a different way, are really what we hear from customers about network hardware and managing network hardware in general.
I think we saw this really interesting shift 10 to 20 to five, depending on how big your organization is and where you are in your transition with storage and compute, where these functions move to the cloud, but the networking functions haven't yet.
They've, in a lot of ways, stayed in the data center. And what that's meant is that all of these traditional challenges of managing capacity, how big of a box do you need?
How big of a box are you going to need in five years?
Where are you going to deploy that box? How are you going to make sure that all your traffic, regardless of where it's sourced from and destined to, can be able to route through that box?
How are you going to install new software and maintain it if there's upgrades?
What do you do if there's patches? Especially important for security devices where new vulnerabilities and things like that can emerge every day and you need to be able to respond fast.
Customers are dealing with all of this stuff at once with these really traditional hardware approaches.
And so they kind of come to us and say, is there a way that you could do this differently?
Take an approach like we have for network firewalls, for DDoS mitigation, web application firewalls, all these other security functions that we've helped customers move from on-premise devices to the cloud and do that for IDS.
And we said, yeah, actually, we're really excited about that.
Let's figure out how to build it.
So I'd love to talk to both of you guys more a little bit about that. How did we build our IDS or even start thinking about the approach that we wanted to take here?
I remember in the early days of some of these conversations, that was a lot of Chris hacking around on your home network.
Can you tell us a little bit about that process?
What are some of the things that we looked at early? What did we experiment with?
And what did we learn from those experiments? Sure. Yeah. Initially, we looked at existing open -source IDS software like Suricata and Snort, which are amazing pieces of software, and looking at, could we run something like this on our edge network?
And as we started looking into it, we realized just how much functionality is in this software, but how much functionality we actually are developing in Cloudflare itself.
We have a lot of amazing products and a lot of amazing teams really pushing the boundaries of their own products and their own use cases.
And a lot of those use cases and products also implement some of these IDS capabilities today.
So I think as we went down this road, we started realizing it's better to just increase the capabilities of IDS and its various products so that teams can really focus in on that and making their products better and better.
So we really were looking for that. And in doing that, we decided to actually build our own technology to do the signature detection at the packet level.
And we felt like that would help us really tune really well and solve the problems while still allowing teams to solve their problems and make their products better.
Cool. I'd love to pass it. Jordan's been definitely looking at this stuff for a while.
So Jordan, why don't you go ahead and introduce yourself and maybe talk a little bit more about how we built this?
Yeah, sure. My name is Jordan Grieg.
I've been a systems engineer here at Cloudflare for coming up on three years.
Worked on Magic Firewall, I guess, now for the majority of that, and then previously touched on our Spectrum product a little bit.
But with what we've been looking at for the IDS, I think one of the first things we realized we were going to have to design around was how to fit this into an anycast network as well.
So obviously, what's a little bit unique about how Cloudflare processes traffic is that with our architecture, we expect traffic for any customer to land pretty much at any server around the world.
And so that makes our IDS a little bit challenging as well, because if we ever think about wanting to differentiate packets of one customer to an exit, something that's already built in through metals, but could be pretty challenging to integrate with an existing open source solution where a lot of times they just see like a single flow of packets coming in the system and out the other end.
So that was also kind of one of the early challenges that we faced.
And as we started kind of working on our own product that could have like a little bit of a smaller or more focused scope, since we can integrate with other software systems than Cloudflare, the first thing was just to see what was possible, right?
So there's plenty of, like we said, open source systems, but also a few different rule like syntaxes.
So that way, like security researchers can publish a certain specification.
This is like, this is what you're looking for.
And these open source systems can kind of consume it. And so we kind of produced a piece of software that was capable of just operating on like a single packet at a time and saying like, okay, evaluate this packet against the set of signatures and that kind of thing.
And we spent a little bit of time tuning it as well.
What was interesting is that rather than just treating all the signatures the same, you start to see lots of logical groupings between them.
So there might be six signatures where the sequence of bytes you're looking for in the payload is off by just like a single byte.
And so those kinds of things, we start to build on some kind of preliminary early optimizations for it, because what we really want to control for is reducing the impact of putting our IDS system in place.
As an IDS, the impact is already going to be kind of low because we don't actually have to do anything with the packet.
We just get it inspected and notify through analytics if there's a problem.
But if we think about moving to an IPS world, one of the hardest things to answer is like, well, what's going to be the impact?
And so are we talking single digit nanoseconds or tens or hundreds or, heaven forbid, any more than that?
And so those early proof of concept and optimizations on the algorithm we used to run our detections was a pretty important part of it.
And then where we have really just started looking over the last month or two is how to fit this in with the rest of the architecture so that our first pass at it was really only something that functioned locally.
You could submit a PCAP or send a local packet stream through the system, and it would tell you if it found anything.
But that was only just a first step. And so now we're having to solve the even harder problem of plugging into Cloudflare's architecture.
And so within that, our servers essentially function like Linux routers.
We run a bunch of software to get packets off a network card and then do some processing and stick it back on the card on the way out.
And so within that, we've got, because we're Magic Firewall, we've got integrations with products like Magic WAN and Magic Transit, and even those products have other integrations that have to figure out how to handle all those use cases.
And so what we've been toying with really is essentially how do you get packets out of a Linux kernel, which is a pretty fun problem.
And we've come up with, oh man, maybe half a dozen at least different functional solutions for different cases.
And this thing fits in cleanly here, but maybe we're going to miss a certain packet stream of private traffic within a Magic WAN flow or something like that.
And so we've been working really hard to make sure that our initial design is going to function for all those flows, even if we end up going with a pretty slow rollout.
And then the other thing that's been, I guess, really important for us as we think about our design is how to make the system safe.
So I had talked a little bit about what the latency implications of sticking another service in the critical path of a packet flow is.
So not only do we want to reduce latency, but should anything go wrong with the system, we don't want it to affect other connections in a multi-tenant or other customers, again, because we live in a little bit of a multi-tenant world here.
And so we've been looking at ways to make the system that's capable of monitoring itself a little bit or failing open in a way so that worst case, we'd rather get packets to the customer than have them be stuck in an inspection queue if we're unable to keep up with the load.
And so again, in a future world, this thing will be perfectly tuned and we'll be able to handle everything.
But as we think about our first launch, we're really trying to make sure that we've got a limited and understandable impact.
And we've got to set ourselves up for success where we can do some performance tuning, understand the statistics of the live system.
But yeah, we're super excited to keep making progress.
It's been a ton of fun to build and it's also been a ton of fun to start to understand more and more about how their magic projects work because there's a lot in there that's truly magic.
Yeah, I remember starting at Cloudflare and in my second week, I think I had this orientation session or walkthrough with Eric, who's on one of the magic engineering teams, where he talked about the packet flow.
And there's this awesome Linux networking diagram with a whole rainbow of colors and this packet flow and all these crazy acronyms.
And really what we've been trying to do in IDS world is, okay, now that we have this engine that can detect patterns based on all these signatures that we can pull in from different places, where do we put that in that crazy looking diagram?
And the trade-offs and the decision making around that, I think is really fascinating.
I also love that you talked a little bit and Chris too about this sort of build versus buy decision for us or build versus use open source and deploy is really what it would look like.
And I think you mentioned, we already have these detection engines that are doing similar things in lots of places within Cloudflare.
So if you think of IDS as really a capability that can stretch across layers of the OSI stack, that helps us insert it in places where customers are maybe already using those products or have traffic flowing through them.
I also think the other cool thing about it is that in the future, maybe there's more upfront work for us, but it'll make us more agile and more able to make changes when we have any kind of challenge, right?
If scale, there's more people using it, we're able to scale up easier.
If there's a sort of a fundamental architecture thing like that, easier if it's our own code or actually even also integrating new types of threat intel information.
We made a similar decision around this for developing our own Internet key exchange implementation for IPsec that we talked about in our blog recently.
And I think we're seeing that already pay off. It's going to pay off similarly here.
So cool to hear about that process. Oh yeah, go ahead. I was going to say, Eric gave me the same presentation and that's what we've been really staring at is like the Netfilter flow diagram, right?
And there's a few different places in there where you can get packets out and there's a few other places where you can get packets in.
And what's been kind of the fun part of this journey is figuring out what makes sense, again, in the context of the other magic systems.
Because again, we want to take a lot of care and do our due diligence to make sure this thing performs or can perform at least as fast as possible.
And so how do we get something out and then put it back in without introducing another routing loop even?
We're really heavy users of Linux as an operating system. And so doing something like partial kernel bypass doesn't make sense for us because we know that we're going to put the packet back in the network stack anyways.
And so maybe there are some cases where you can do that efficiently, but if we have a little bit of extra knowledge of what came before our service and then what comes next, we can make more intelligent decisions about how we get packets out of that flow.
So yeah, but it's been really fun to work with.
Cool. Yeah, I feel like we need a version of that diagram or an annotated or something on our blog somewhere with like, you are here.
This is where the magic systems live in this wild, exciting mess that is the Linux kernel.
Cool. So we talked a little bit about the architecture, what we experimented with so far.
What's exciting about this? So I talked about customers, how they're solving this problem today, and then how we've approached it.
Jordan touched on that. Chris, what do you think if you were a customer, based on all the folks that we've talked to that are in different security organizations and what you've heard from them, why is the way that we have architected our solution exciting?
Why should customers choose Cloudflare over just enabling the IDS functionality on their on-prem firewall?
Yeah, I think it comes down to is we can think outside the box.
There's only so much processing you can do in an appliance and you can build really amazing appliances that do all sorts of things, but we're actually able to distribute that processing across tons of machines in our data centers, which means that that processing can scale.
So I think one is just our architecture at Cloudflare really allows for that distribution of processing.
So that's one thing that's really exciting. I think another thing, because that's also something that's scary about a lot of IDS systems is worrying about performance and are you having to sacrifice performance for security?
And we'd love to make that case that you can have your cake and eat it too.
So another thing is just the speed of software. So maintaining appliances and upgrade cycles and this and that is tough and challenging and we want to provide a really seamless experience where we're able to push new updates and new features as we get them and you're automatically able to use the latest thing that we cook up and make really solid.
So I think that's a huge benefit. Yeah, also we can put things together really well too.
So you can use these products together in concert to give you even more power.
So IDS is a capability to give you detections is there, but also you can use visibility or in terms of like looking at your traffic, looking at your PCAPs, using traditional magic firewall rules to enforce positive security models, all the things together.
So I feel like there's a huge benefit to choosing Cloudflare.
And one more thing is just also how we curate our threat feeds.
So not only are we able to use various signatures through other open source feeds and other feeds, but also leveraging what we learn at Cloudflare throughout our network and providing those signatures to users as we discover those and as we make those better.
So I feel like that's a huge benefit to using our solution.
Jordan, did you have anything you'd like to add to that?
Yeah, I mean, I think just to touch a little bit more on how the system can integrate with other products.
I mean, we're really just getting started, but the IDS is not an isolated system.
It fits in a chain of other systems that have potentially also touched the packet.
And so the kinds of things that we've got the ability to do is to take something like an IDS alert and give you like a one quick button to deploy a new firewall rule that's going to block that kind of traffic, right?
Because maybe you're nervous about putting the device in IPS mode right off the bat.
And so those are the kinds of integration we get. And then one thing I still get really excited about internally is all the progress we're making on our network analytics, right?
Where we're giving customers the ability to see all the different decisions that we're making on a packet throughout that flow.
So where did our DOS systems kick in? Where did Magic Firewall kick in? Where did our IPS potentially kick in?
And can you see those things happening in order all at the same time and the same view?
And that gives customers really great insight into like, okay, how is Cloudflare helping protect my network?
Yeah, that's another blog post I think that we actually published today around Cloudflare 1 observability.
We have analytics that exist in the dashboard today where you can go and see your Magic Firewall rules and actions and things like that across the traffic.
But then we're also building more and more of these tools and bringing more of this stuff together so that in a single view as a customer, you can see the end-to-end packet flow, everything that's happening on one of those packets or requests that's flowing through our network, which is really exciting and something unique compared to traditional approaches where you have maybe four different boxes sitting in your data center.
So you need to log into four different places to understand what's happening to your traffic and kind of piece that picture back together.
I think also around what you were mentioning, Chris, about performance, I think we talked to a customer recently that said something that really resonated with me around their decision to not enable the IDS capabilities on a firewall that they already have.
I think this individual was on an IT team and they said something like, my security team totally hates me and it sucks to be that person, but I can't turn this on because if I do, then I'll get a long list of IT tickets being like, why is this thing that I need to access slow?
And so we're almost looking at kind of like a Maslow hierarchy of networking and security needs here.
First, you need to be able to connect to the thing in the first place at a speed that makes it possible for you to do your work.
And then on top of that, that you can think about adding those additional security detections.
And for us, I think the threat intel piece of having these 20 million Internet properties on Cloudflare's network and feeding all the information, what we learn back into the product is one of those additional kind of like upper level needs that gets really exciting when you're able to look at that from the perspective of the Cloudflare IDS.
So yeah, lots to be really excited about.
We've talked a lot today, I think so far about maybe current state mostly. We have just a couple of minutes to touch a little bit on what we're working on next.
So if you're an existing customer of advanced magic firewall, so if you Google magic firewall, if you're not familiar with what it is, again, it's our network firewall delivered as a service from Cloudflare's network.
You get a standard version of it built in for free if you're a Magic Transit or Magic WAN customer.
And then there's a whole host of advanced features, things like packet captures on demand, integration with threat intel feeds, and now this private beta of our new intrusion detection system capability.
So if you are already purchasing advanced magic firewall and interested in getting access and you're giving early feedback on this, you can do that now, talk to your account team.
And then in the near term, we're planning on building on top of this MVP, introducing more capabilities, customer configurability, analytics, sort of different knobs and bolts for you to, knobs and buttons for you to kind of tune aspects of the IDS system yourself.
And then in the future, there's a lot of runway for us here.
We've talked a lot today about signature-based matching as kind of the primary mechanism, but we are also considering things like anomaly detection, being able to look for events that are the sort of most unknown of unknowns, right?
Just like weird activity that's happening on your systems that you might want to look into further or place a rule to take some action on.
So tons to be excited about in this space. Reach out to your account team if you are curious to get access to what we have today and give feedback on what we're looking to build in the future.
Hearing from you is the best way that we have to make good decisions about where to focus our engineering and product efforts and what to build in order to make this platform even more exciting and valuable to all of you.
I think that's about as much as we have. Chris or Jordan, any closing thoughts, maybe something that's been exciting to you about this process of building an IDS or something you've learned?
I think just getting too obsessed over the details.
I mean, I think we take the responsibility of wanting to give customers the ability to deal with or to turn on IDS without having to deal with all the complicating factors of it.
And so, we're digging into how many syscalls is it going to take us to inspect the packet, right?
What is the maximum throughput that we can get and all this kind of crazy stuff.
I mean, it's because we care about making a really awesome product that's as easy as just dropping it in and turning it on.
And then as time goes on, giving as much power as we can to customize different parts of the system.
Yeah, I'll just add, just a big thanks to everybody that's helped us go this far so far in designing the IDS and thinking about all these systems.
So, we have a lot of smart people looking into this, thinking about it, a lot of great teamwork.
So, super excited about what I've seen so far and what we're going to accomplish in the future.
Awesome. Totally agree. And again, reach out to Cloudflare if you are interested in learning more about any of this stuff.
Please keep an eye out for more Cloudflare TV segments, blogs, everything through the rest of Cloudflare One Week.
There's so much happening and we are excited that you are on this ride with us.
Have a great rest of your day. Thanks for watching. The real privilege of working at Mozilla is that we're a mission-driven organization.
And what that means is that before we do things, we ask what's good for the users as opposed to what's going to make the most money.
Mozilla's values are similar to Cloudflare's.
They care about enabling the web for everybody in a way that is secure, in a way that is private, and in a way that is trustworthy.
We've been collaborating on improving the protocols that help secure connections between browsers and websites.
Mozilla and Cloudflare collaborate on a wide range of technologies.
The first place we really collaborated was the new TLS 1.3 protocol, and then we followed that up with QUIC and DNS over HTTPS, and most recently the new Firefox private network.
DNS is core to the way that everything on the Internet works.
It's a very old protocol and it's also in plain text, meaning that it's not encrypted.
And this is something that a lot of people don't realize. You can be using SSL and connecting securely to websites, but your DNS traffic may still be unencrypted.
When Mozilla was looking for a partner for providing encrypted DNS, Cloudflare was a natural fit.
The idea was that Cloudflare would run the server piece of it, and Mozilla would run the client piece of it, and the consequence would be that we protect DNS traffic for anybody who used Firefox.
Cloudflare was a great partner with this because they were really willing early on to implement the protocol, stand up a trusted recursive resolver, and create this experience for users.
They were strong supporters of it. One of the great things about working with Cloudflare is their engineers are crazy fast.
So the time between we decide to do something and we write down the barest protocol sketch, and they have it running in their infrastructure, is a matter of days to weeks, not a matter of months to years.
There's a difference between standing up a service that one person can use, or 10 people can use, and a service that everybody on the Internet can use.
When we talk about bringing new protocols to the web, we're talking about bringing it not to millions, not to tens of millions, we're talking about hundreds of millions to billions of people.
Cloudflare has been an amazing partner in the privacy front.
They've been willing to be extremely transparent about the data that they are collecting and why they're using it, and they've also been willing to throw those logs away.
Really, users are getting two classes of benefits out of our partnership with Cloudflare.
The first is direct benefits. That is, we're offering services to the that make them more secure, and we're offering them via Cloudflare.
So that's like an immediate benefit these users are getting. The indirect benefit these users are getting is that we're developing the next generation of security and privacy technology, and Cloudflare is helping us do it.
And that will ultimately benefit every user, both Firefox users and every user of the Internet.
We're really excited to work with an organization like Mozilla that is aligned with the user's interests, and in taking the Internet and moving it in a direction that is more private, more secure, and is aligned with what we think the Internet should be.