ย Cloudflare TV

๐Ÿ”’ Security Week Product Discussion: Packet Captures at the Edge

Presented by Nadin El-Yabroudi, Annika Garbers
Originally aired onย 
English
Security Week

Transcript (Beta)

Hello. Everyone.

Welcome to Cloudflare TV and welcome to Cloudflare Security Week. My name is Annika and I'm a product manager here at Cloudflare, which means that I'm super excited all week because we are announcing lots of new products and features that are going to help you as a customer keep everything that you have connected to the Internet more secure.

I am super excited to be here with my colleague Nadin, who's an engineer here at Cloudflare.

Nadin, do you want to introduce yourself and talk a little bit about what you work on here?

Sure.

Hi. I'm an engineer on the Magic Firewall team at Cloudflare, and I've been working with Annika on this packet capture thing that we launched and we're going to talk more about it today.

Yeah, totally.

So among the long list of announcements that is coming out this week and today specifically, we published a blog post earlier today about on demand packaging captures and we're going to dig into what that's all about, how they're going to help you as an engineer understand more about your network and solve some of the problems of current network architecture.

And maybe dig into a little bit about what we're excited to develop next.

But let's take it all the way back to start with.

Nadin, can you set the stage a little bit for us?

What is a packet capture even in the first place?

What are we talking about when we say you can get packet captures on demand from Cloudflare's edge?

Sure.

So a packet capture is a file that has a bunch of packets that are seen by a network box.

And you can think of these network boxes as maybe a firewall or a router, and it's basically a file that has a dot pick up extension and you can read it using TCP dump, which is a pretty common way to do it, or also Wireshark.

And I want to show you what that looks like.

Let me just share my screen.

So yeah, this is what a packet capture file looks like on Wireshark when you open it.

And so, as I said, it has a bunch of packets in here. Each of these rows on the top part of the, of the grid, each of these roads is a packet and you can see the time that the packet came into the network box, you can see the IP source, the ports, the protocol, a bunch of information and you can also dig into all of the headers for a packet.

So we're selecting this packet here that's in dark blue.

And so in the bottom part of the screen, you can see how you can look at the Ethernet header if you want to or the IP header.

And you can also look at the data so you even have the payload data in there.

So yeah, they're just really helpful to really understand what packets are coming into your network.

Nice.

Glad that were able to show a real life example of one too. And any network or security engineers who are on or in our audience here probably seen a bunch of examples of these before.

So look ahead.

Yeah.

And so what? How can network engineers actually use these packet captures?

Sure.

So this feature functionality that we're launching today, we've gotten lots of requests for a long time because engineers love packet captures as you were showing in that example just now, there's tons of useful information in here that can help engineers debug problems and then also understand more about traffic patterns enabled and enable them to more secure or excuse me, implement more security policies for their network.

So for the first use case, we can talk about a network engineer troubleshooting some network problems using a packet capture.

So this could be related to maybe someone sends in a support ticket and says, Hey, it looks like I have a connectivity problem.

I can't get my packets all the way to your network.

Or maybe they're getting there really intermittently.

And can you figure out what's going on?

So a network engineer might use a packet capture in combination with other tools like Traceroute, for example, to understand how a packet is traversing the network path between its source and where it's ending up at the customer network.

And then if the packets are actually arriving all the way there, if they're getting to the customer router or firewall, we're able to run a TCP dump, for example, and get a packet capture to look at.

Then you can go into the packets and understand what aspects of this might actually be causing problems.

Maybe there's something that's filtering them out, or you're only receiving some of the packets because of a problem in an application or an upstream provider.

So that's one example.

So troubleshooting, if you look at a packet capture, kind of one of the most straightforward things you can do is looking at a packet capture to determine if network traffic is even reaching you in the first place.

And so if someone says, Hey, I'm trying to connect to you, but I can't get to you, if you look at a packet capture and you see the packets coming in, then it might be, for example, a problem with the return path and you're just not getting packets back.

So that's one kind of easy troubleshooting step that maybe even folks on our support team take all the time to support network engineers and figuring out what's going on.

And then another use case here is on the security side, understanding, maybe not when there's a problem with connectivity or reliability, but actually digging into packet patterns to understand potential attacks or malicious traffic and then be able to take action to actually block those packets.

And so an example, something that you could do here and our DDoS engineering team takes steps like this pretty frequently is if you have a packet capture of an attack or a traffic that you suspect is associated with an attack on your network, you can open it up and look for what's called an attack signature.

So some characteristics of the packet that you can match on and add a rule to say, We're going to block this or we're going to somehow mitigate this traffic maybe by applying a rate limit or something like that.

So network engineers, you pick ups to troubleshoot problems.

Security engineers can use them to understand attack patterns and then take actions based on the information that they get from those to further lock down their network, make it more secure and keep the bad traffic out.

Yeah, it makes sense.

So how did engineers capture packets like this in a traditional network architecture?

Yeah, you mentioned there's the packets run through like physical boxes generally.

And so what a network or security engineer do is log into a router or a firewall appliance and run TCP dump or something like that on that one appliance.

And in a traditional network architecture, which is sort of the castle-and-moat security model or the perimeter security model, a lot of the traffic or the vast majority or maybe all of the packets that are destined for a given network or maybe flowing through that network and destined out to the Internet would flow through that single box.

And so from that one point, you could get a really solid understanding through a packet capture of everything that was happening with that traffic.

It's kind of like if you have a castle and there's a moat in a drawbridge, all of the people coming in, you can have one guy clicking a clicker or looking at every single person or something, whatever the analogy is there that's going through that drawbridge.

So that's kind of how it used to work, made a lot of sense in traditional network architecture world.

Right.

But we're Cloudflare. And so we're don't we don't believe in the castle-and-moat models.

So how are we?

I mean, how are we changing this over time? Yeah.

Yeah. I mean, I think what we're seeing from a lot of customers that talk to us, really, no one has a strictly perimeter security model anymore.

And the reason that we've seen kind of a shift here initially was with applications moving out of data centers.

So in this castle-and-moat model, all of the applications that people used to need access to were in a data center, maybe a stack of servers that were in an office somewhere, and all of the users that were within that location could get to all those applications by being on the private network, on the local area network or on the win the wide area network via some form of private connectivity.

But the cloud changed that storage and compute move outside of the data center.

And so now users needed to be able to access applications that were anywhere out on the Internet in the cloud, not just within the data center.

So that's one thing that changed.

And then the second one was that users actually also left the data center.

So now you have people everywhere in the world potentially that only to get to those applications.

And so you used to have everything in one place and now everything can be anywhere.

And what that's created is a really, really fragmented world in these complex networks where people are trying to use this kind of patchwork of solutions to protect their infrastructure and even just route traffic to different places.

And so where network and security engineers used to kind of have one place that they were able to get a packet capture and understand what was going on with their network.

They had one drawbridge. Now they have drawbridges all over the world.

They have drawbridges in places they might not even know about. And so that can make this kind of troubleshooting really, really complex.

We think about these network architectures as kind of like Generation One or the traditional castle-and-moat generation two is this sort of virtualized function or patchwork style of network architecture and where we want to help customers get and you mentioned like we're Cloudflare and we want to do things differently and help customers secure their networks in a new way that's going to set them up.

Better for the future is this third generation of network architecture that is kind of what Gartner refers to as SASE, secure access service edge.

And the idea is moving those security functions that used to be in a box, in a building somewhere or in maybe a virtualized box, which is sort of just the same idea.

It's the same software, but running on somebody else's box in a cloud, we're actually going to move those functions to the actual cloud, a cloud edge, and build them in software that we run on commodity hardware from the ground up.

And so what that means is that users that are connecting to applications, their traffic is going to land at the location.

That's the closest to them.

That's where we're going to apply those security policies and then we'll send the clean traffic back to customer locations or to its ultimate destination.

Maybe that's somewhere on the Internet, regardless of where it needs to go in the world.

And so this is a really fundamental kind of different way of thinking about network architecture.

But how this helps kind of bringing it back to our announcement today is that instead of having to take packet captures in a whole bunch of different places around the world, Cloudflare can act as a unified control plane for all of your traffic.

So all your traffic from all of those different places in the world, kind of all routing through our network, we can give you one single button and one single place to say, I'm going to get a view all of my traffic and present it to you kind of in the same way that you would have used to gotten used to have gotten access to it on one box in your data center, except now we're delivering it to you from our entire global network.

So we sort of had this hub and spoke out to this really fragmented thing to now what looks like Hub and spoke.

But the hub is actually our global network that's everywhere across the world.

Long explanation, but I think we got there gen one, two, three. We want to help customers get to gen three.

Yeah.

So that's what we're doing with packet captures, helping customers get visibility across the network.

It's super exciting and I'm excited that you are here to tell us more about how it actually works.

So we had to write software in order to give this capability to our customers.

Can you tell me a little bit about how this whole system works and maybe highlight some of the aspects of it that are most exciting or most interesting?

Yeah.

So yeah, delivering packet captures is something we've been working on for a little bit on my team.

And there is I guess, three main components to how we make this work.

The first is the API, the second is actually capturing the packets and the third is sending it to customers.

So I'm going to talk about those three.

And so we have an API that customers can use to request the packet capture.

And it's pretty simple.

You just say, I want a packet capture on my account and you can provide a filter for the kinds of packets that you want to receive, right?

Because if we don't have a filter which you can do, you can just request the packet, capture it without a filter, but you'll just get whatever packets are.

And usually what you want is to debug something or look at particular traffic that you think might be involved in an attack.

So we give you a filter.

You can filter by things like IP source or destination addresses.

You can filter by ports and you can also filter by protocol.

And so that's all customers have to do.

Just request a packet capture with a filter and then we take it from there.

And that's this is the second part, which is how do we actually create these packet captures.

So as Annika was referencing, like we are, what we want to do is capture these packets across our entire global network so that you can see all of your network traffic and have it in this packet capture.

And so what we have to do is create these PCAP files at our edge in our distributed global network.

And the first issue that we have is these packets are all coming in and they're going through the kernel in kernel space and we have to filter them and start to log them.

So we use something called F Table to do that, which is a configurable firewall in the Linux kernel.

And so you can think of like firewalls, what do they do?

They like filter packets and they either allow them to go through or they block them.

In this case, we don't want to allow or block packets. We just want to log them so that we can see them in this packet capture.

So we use enough cables to do that filtering and then we ask it to log the packets that go through that filter and the log is called NEF log.

And what it basically does is it sets up a socket that goes from kernel space to userspace.

And in userspace we have to have some program that actually reads these packets and puts them into a packet capture.

And what we use is TCP dump, which probably a lot of customers are familiar with TCP dump because it's a pretty popular tool to get packet captures.

Usually though, TCP dump is used on a network interface, and so you just tell TCP dump, you know, listen on this interface and create a packet capture.

But in our case, it's actually reading from that socket where the packets are being logged.

And you might wonder like, why do we do this? Why don't we just use TCP dump, which is what most people use to create packet captures?

Well, as it turns out, we have more flexibility if we use nftables. So you can think of like as packets are coming into Cloudflare, we do a bunch of stuff with those packets.

We can apply a firewall and block certain things.

We also apply data mitigation.

So there's a couple of steps in this packet.

Life during its time at Cloudflare and with nftables we can actually decide where we want to capture packets.

We can capture them before the firewall or between the firewall or the mitigation or after all of these mitigations.

In our case, for now, what we're doing is we're capturing them after all of our mitigations.

So what that means is that when you request the packet capture with Cloudflare, what you're going to see are the packets that are going to end up at customers origins.

And so that's a really helpful tool for customers to debug what's going on at Cloudflare because they can see what packets did Cloudflare See and then compare them with packets that they're seeing at their origin and start to understand where they might be having any issues.

So yeah, that's why we use nftables as kind of a side, but yet we use nftables, we pair it with TCP dump and basically TCP dump will give us this packet capture file.

And third part of the of the story is we have to get that packet capture file to customers.

Right. And there's kind of two important things about packet captures that makes this a little bit of a difficult problem.

The first is that packet captures can be kind of sensitive because as I mentioned at the beginning, they can include packet payloads, right?

And so even though there's encryption, which is HTTPS and other stuff like that, those payloads can still be pretty sensitive.

So we want to make sure that we're not storing them somewhere in Cloudflare because that can get kind of tricky.

And the other thing is that those packet captures can also be pretty large, depending on how much you're filtering and how many packets you're grabbing.

They could be maybe a couple of gigabytes in size.

And so it would be difficult to have customers download a couple of gigabit file from our API because where do you where do you store that?

And it just sounds a little bit complicated, right?

So what we've decided to do is to send our packet capture files directly from our edge to customers cloud storage services.

So when you request a packet capture, you also set up.

You also tell Cloudflare where you would like them to send that packet capture.

So it could be an AWS S3 bucket or maybe a Google Cloud storage bucket that you have.

And once our edge captures that package capture, it will send it directly to that bucket that customer's configured.

And that's really great because it means that customers can manage these really large files and they are already allocating some space to store them.

And for Cloudflare, it means we don't have to store that sensitive data.

And once we've pushed it to customers buckets, we just get rid of that file and don't have to worry about the sensitive data.

Nice.

That's so cool. Okay, so I heard some really interesting aspects of each of the kind of three buckets of things that you mentioned.

I think one thing that we take for granted sometimes that Cloudflare but is really different to how a lot of our competitors have approached.

Developing their security solutions is just the bet that we made a long time ago, really.

When we founded the company and started building software on the Linux kernel, we said we're going to build all of our all of our software on the Linux kernel, on commodity hardware.

And a lot of people approached this problem differently.

They said, we're going to build custom stuff.

We're going to build things on custom software or custom hardware.

That's kind of purpose built for these specific things like DDoS mitigation.

And that's the time that we made that bet.

That was sort of gutsy, but ten plus years in innovation, a lot of which is actually come from people that work at Cloudflare, contributing back to the community, have made that kind of bear out and now our software is able to be not only really, really fast and really flexible, but also adaptive and give us back lots of interesting kind of pieces and tools that we can pull from to use to develop systems like this.

You mentioned we were able to kind of pull from nftables and we actually had a couple of different options with how to do this, which is really cool.

And I know that there's other folks that are on your team or kind of peer teams to are leveraging interesting new mechanisms of the Linux kernel as well.

Things like BPF and IOU ring for new use cases and then again contributing back to the kernel community when we have kind of a use cases that are that are upstream or interesting to more than just us.

So that's really cool.

And then I think also the idea of like a coordinated packet capture, I missed this in my explanation of like how network security engineers would use these things.

It's often not useful to have just a packet captured from one point in a network stack, right?

Because let's say that someone writes in and they say, Hey, my packets aren't getting to you, and you take the packet capture and you're like, You're right, my packets aren't getting to me.

Then what do you do?

And so a lot of the time, the, the troubleshooting step is kind of comparing what you can see from your network infrastructure with maybe what we're able to provide you from Cloudflare's vantage point and then being able to troubleshoot from there and isolate the source of the problem.

So that's really neat.

And then the last thing that is kind of cool that I pulled out from what you mentioned is the filtering aspect, right?

Like a packet captures literally what it sounds like.

It's like copy of all of the packets that are going through a system and that amount of data can get really big, really fast.

And especially if you're only interested in data for a specific characteristic, like you maybe want to understand what's going on with one user that's having a problem, being able to filter that at the time that you capture it.

So there's tools that will do that kind of filtering for you, but being able to filter it out at the beginning before you even start taking it and say, get the traffic by this source, IP is really helpful because then the size of file that you need to deal with is going to be much, much smaller and more manageable versus like.

Give me everything.

From all my traffic across the world, which if you have a big network, could be a lot and then trying to process all of that at once.

So those are some really cool aspects.

Yeah.

No, without the filtering, it's almost like a needle in a haystack problem.

You you're going to get all these packets and you might care about like one or two of them, but if you can filter, then you're more likely to get the packets that you actually want.

So yeah, that's a really important part of it.

So this has been like an exciting project that you've worked on with a couple other engineers.

Just from your perspective, if someone's watching this and they're interested in becoming an engineer at Cloudflare Like, what's aspects of this project that have been particularly interesting for you or fun or maybe challenging or maybe like lessons that you learned going through this process that you would apply to other projects that we do in the future.

Switching gears to just kind of like your perspective as an engineer here?

Yeah, that's a great question.

I think one thing that's been interesting is we had a couple of different approaches, as you mentioned, to how to do this.

And so we had to deal with all of the different unknowns in those different approaches and slowly like try to figure out what's the best approach.

And this was actually a big collaboration between a bunch of different teams at Cloudflare because there's folks who think about how much data can we send from our edge to like our more centralized clothes.

And you got to talk to them and figure out what makes sense for them and how much like what can we store?

We have to think about can we store the sensitive information or not?

Like what are what are cloudflare's privacy policies around that?

So there's been a lot of like working with other folks, which is I really like it because I get to know new people and like talk to other people that I might not get to every day.

So that was kind of an interesting part about it.

But yeah, I think like narrowing it down to one solution was hard and having to think about what makes sense for customers.

How do we deliver like a file that they can actually process?

Because as we talked about, like if we just tried to capture absolutely everything, it's just like impossible.

Like who could look at like a terabyte file and make sense of it, right?

It's probably too much.

So trying to think about what makes sense for them and like what is something that somebody can digest because at the end of the day it captures like it's going to be somebody like sitting there and actually like looking at them and analyzing them, right?

So it has to be like human readable and a good a good size.

So totally.

Yeah.

Yeah. I think this project has been really interesting for me as a product manager and I really enjoyed working with you and learning a lot from you through this process because there's lots of features where kind of like what goes in the requirements list is pretty straightforward and customers can just tell you the answers, right?

Like if you're adding a new button to do something or maybe one of the other things that we've added for Magic Firewall recently has been like customer IP lists.

So a customer says, Hey, I want to be able to have my own list of IPS and be able to use those in any Magic Firewall rule.

And that's like relatively straightforward from a requirements perspective, right?

It's like, all right, if we achieve that and as a customer you can do XYZ, then you're good.

But here I couldn't really go to a customer and ask like, Hey, how do you want us to process this amount of packet capture data?

Or like how big of a packet capture do you think you're maybe going to need to be able to process or what's upper limits?

Because if you ask those kind of questions like everybody will say, well, all of it, right?

Like I want I want all of it. But in reality, that's impossible for them to actually use.

And so it's been this interesting exploration of kind of like back and forth to figure out what the right sweet spot is of what we're able to process and what levers we can give customers to help them.

Also understand, here's the amount of data that I'm going to be talking about when I start recording and packet capture across Cloudflare's entire edge because again, it could be more data than they're maybe expecting or thinking about.

So that's been kind of a cool process.

Yeah, for sure.

Yeah.

How do you so now that we've released this and how do customers actually get access to this?

Yeah, so I'm super excited to share that.

The packet capture API that you mentioned earlier is in general availability for customers.

So anyone that's purchased the Advanced Magic Firewall bundle has access to this.

If you haven't purchased that and you want to, you can contact your account team.

So this is for enterprise customers right now, but we're also considering and looking into options for what it would look like to extend something like this to our self serve customers as well.

And then.

We're.

We're releasing new features soon.

So some of this additional functionality, things like the log streaming and stuff like that is going to be available to again that group of customers.

So anyone with access to advanced Magic Firewall as soon as they're available.

And if you're interested in being in the early access or beta kind of testing groups for any of that stuff, reach out to your account team, let them know and they'll make sure that you can get in touch with us so that you can get access to those as soon as they're ready.

Awesome.

I'm so excited to have people use this. It's going to be great.

It's going to be so cool.

We have just a couple of minutes left. I think one of the things we like to do at Cloudflare, our innovation weeks are a mix of new feature announcements and things that customers can use today that are ready in their hands.

Things like this packet capture API that's available to customers now.

But also we talk about new ideas and try to get customer feedback on things that we might build in the future.

What are some of the things that you are excited about or think that we might want to add on top of this packet capture on demand capability that we have today?

Like what would you be really excited to develop next?

Yeah, great question.

So yeah, we have a couple of things that we are thinking about building.

For example, these peak apps are something that customers have to request and they will get them sort of when they know that they need them.

But what if customers could like automatically get these peak up samples when we think it makes sense for them to get them right.

So we could assume ahead of time that looks like there's something going on in their network, like maybe it's a DDoS attack or just something that looks irregular, like let's capture a packet, capture for them and deliver it to them.

And that way they don't have to guess when is the right time to get these packet captures.

But we have all the knowledge about what's going on in their network and we're seeing all of this traffic.

So it would make sense for us to proactively tell them, Hey, by the way, here's this thing you might want to look at.

Nice.

That's a great example. Yeah, customers have access.

For example, if you're getting DDoS protection from Cloudflare, you can go and see analytics and things from the traffic that's coming in and all kinds of metadata about it.

But sometimes customers really want to know, like if we block an attack for them, show me a packet capture of that attack.

I want to know what the people that are targeting my network are doing at a super granular level.

And so being able to automatically record an example packet capture from something like that would be super cool.

Or maybe again, other kinds of events like if there's a disruption in connectivity, we can't get traffic back to your network, get a packet captured then that you could use in further troubleshooting.

That would be great.

I think also we mentioned pay as you go or self serve use cases for this.

So if you're interested in this and you're a self serve Cloudflare user, you can let us know on Twitter and Discord that this is something that'd be cool for you to have access to.

And then I think also just building this kind of stuff more natively into our analytics and visibility functions that we have throughout the Cloudflare dashboard would be would be really cool.

So you could envision going to the dashboard as a user looking at all those analytics and details and observability that you have access to and then being able to kind of zoom in on those and say, hey, show me a packet capture of that event, maybe right there in the dashboard and open it up.

And then you just have kind of one tab that you need to be able to look at everything that's going on within your network.

Without.

Having to switch between tools.

I think visibility and observability in general is a space that we consistently hear from our customers is super important to them.

You want to know exactly what's going on with all your traffic, and we have a great opportunity with this sort of new network architecture that we're giving customers to make this a lot easier versus the really fragmented kind of visibility that they're dealing with today.

So super excited about all of these areas.

I think that's about time.

Thank you so much, Nadeem, for making the time to chat about this.

So excited about this functionality and all the new things coming out and for the audience.

Make sure you catch the rest of security week on the blog and here on TV. Have a great rest of your day.

All right.

Thanks, Erica.

Thumbnail image for video "Security Week"

Security Week
Security Week is one of Cloudflare's flagship Innovation Weeks, and features an array of new products and announcements related to bolstering the security of โ€” and ultimately helping build โ€” a better Internet. Tune in all week for deep dives on each...
Watch more episodesย