Debugging WARP issues with AI
Presented by: Chris Draper, Koko Uko
Originally aired on August 29 @ 12:30 PM - 1:00 PM EDT
Welcome to Cloudflare AI Week 2025!
There's barely a company or a startup not focused on AI right now. Companies' entire strategies are shifting because of this incredible technology.
From August 25 to 29, Cloudflare is hosting AI Week, dedicated to empowering every organization to innovate with AI without compromising security.
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
Visit the AI Week Hub for every announcement and CFTV episode — check back all week for more!
English
AI Week
Transcript (Beta)
Hey everyone, my name is Koko Uko. I am product manager for Warp for our Cloudflare Zero Trust client.
I'm based out of Austin, Texas. I've been working on Warp for a matter of years now, building some really cool features, but we're really excited to talk about the latest that has to do with AI Week.
Chris? Yeah, hey everyone.
My name is Chris Draper. I'm a product manager at Cloudflare as well. For better or for worse, Koko and I work together all the time.
And specifically, I work on a product called Digital Experience Monitoring, or DEX for short.
And that product is designed to monitor Warp connectivity and performance.
And we're really excited to tell you about some of the new features that we built for AI Week.
And I think Koko is going to jump into telling you a little bit more about AI Week and the purpose of it.
Yeah, super excited. So like Chris said, it's AI Week this week.
Warp and DEX have been working on some things that we're really, really pumped to show you guys.
So we're going to show you those cool new features. They have to do with troubleshooting network connectivity and performance problems.
I'd love to say Warp never has issues, but sometimes it does.
And when it does, Warp and DEX work hand in hand to try and solve those things.
You can read more about it in a blog that we wrote.
You can find it on blog.Cloudflare.com. For anyone that isn't very familiar with Cloudflare 1, Cloudflare 1 is a product that we offer to all of our customers.
And the goal of Cloudflare 1 is to make it easier for enterprise customers to secure their corporate networks.
So typically, if you're an enterprise customer, you're going to have a lot of employees that are working from home, are working from anywhere, are working from the office, or trying to use different resources across your corporate network.
And Cloudflare 1 is a suite of different Cloudflare products that you can use all together to make it easier to secure your corporate network.
Oftentimes, some of the buzzwords that we use in the industry are SASE, which stands for Secure Access Service Edge.
And that defines the architecture that customers can use to implement zero -trust security policies across their network.
Yep. And Warp plays into part of that.
So we're kind of the on-ramp that you would download onto a device to begin your journey in Cloudflare 1 and zero -trust.
So Warp is a next-generation VPN replacement.
A lot of companies, even people, use VPNs to try to manage their devices, make sure that things are being accessed securely.
VPNs have been changing.
Those solutions have been changing over the years with the changes in the workforce.
We are way more of a remote society now, and we need that kind of flexibility.
Cloudflare saw that as a problem for ourselves as a company, and we chose to work on improving that.
So enter Cloudflare Warp. And with Warp, you have less latency than you would with a VPN because we got our Anycast solution, we're using our huge global array of servers to service people all the time.
You have the ability to have insight into devices being used in your environment, so like posture check or just the general health of the device.
We integrate with a lot of different companies to try to make sure that you have that kind of visibility.
And then also just as a general solution for a VPN, there's way more flexibility in the things that you can do with Warp to understand where people are connecting from, who your users actually are, and make sure that policies for access and gateway are applying as you would actually want them to.
So that's Warp in a nutshell, next-gen VPN.
Yeah, I know Warp is a fantastic solution, and I know I have talked with a bunch of customers that said, hey, we moved from a legacy VPN onto Warp, all of a sudden our performance was significantly better.
We get way less tickets about performance issues, reliability issues, that sort of thing.
So if you haven't tried out Warp, if you're a free customer, a Pago customer, or an enterprise customer, you actually have access to it.
And definitely recommend people go put it on your device, put it on your laptop, and give it a shot.
All Cloudflare employees use it, and I know we really love it too.
Yeah, I use it on my personal devices all across.
Dex, you want to talk a little bit about Dex as well? Yeah, absolutely.
And then I'd love to hear a little bit more about the Warp Diag and some of the things that you've been working on.
So just for reference, digital experience monitoring is a product that allows customers to measure their performance and connectivity across Warp.
So even though we love Warp, and we know that it's really performant, sometimes people have connectivity problems or performance problems.
Maybe you're having issues with your Wi-Fi at home, and you're having trouble connecting to a resource at work.
Maybe you're having issues with an intermediary ISP.
You're using AT&T or Charter, and they're having a bad day on the Internet.
Dex kind of does a really good job of running different HTTP tests from a particular device to different endpoints on the Internet, and showing customers what their performance looks like.
It's a really important part of understanding Warp performance and troubleshooting that performance.
Yeah, Dex and Warp work super hand-in-hand on stuff, including what we're doing here with the Warp Diag Analyzer.
Really excited to show you. We've been working on a way to give you guys way more insights about what's going on So if you have a client issue pop up, you might be able to actually see what it is.
Maybe you can do some low-level solutioning and solve the issue yourselves.
Previously, we have this thing called a Warp Diag Log.
It's something you can collect from your device, or you can, if you're an administrator, collect it from the dashboard, and it pulls it from the device once it connects.
So with those Diag Analyzers, people would send them over to us in feedback forms or in customer escalations, and we would go and investigate, figure out what was going on, let you know, hey, this is actually maybe a small tweak that could be done on your side, or a complication with some other applications that you have going on your device.
That takes time. There's way more back and forth there where you get the information, then you send it to us, and then we send you back some sort of analysis.
So we're hoping with this Diag Analyzer, you are the first step and the last step.
You get the information, the analyzer tells you, hey, I think this might be going on with your computer or your mobile device.
I've seen it happen a couple of times.
And then you get some recommended troubleshooting steps, and you get to just follow that and quickly resolve your problem.
So I would love to show you just some screenshots of what this looks like.
You can find it in the dashboard.
Yeah, and I'm really excited about this. I know a lot of people kind of feel like, hey, like, I have to be a Cloudflare expert to be able to, you know, read a Warp Diag and understand it.
It's really, really cool that now you don't necessarily have to be a Cloudflare expert to understand, you know, what's going on with Warp and, you know, how to troubleshoot it effectively.
We have like an AI summary now that will kind of give you an overview of what's going on with Warp behind the scenes.
And I just think that's so cool. Yeah, and it's only going to get better.
Okay, so here are some screenshots of what this looks like in your dashboard.
Yours will look totally different. But in dashboard using Dex Remote Captures, which is another great tool from Dex, you can pull the Warp Diags directly into your dashboard from your device that you have Warp on.
You don't have to have your device in front of you to actually get those Diags anymore.
Use the Remote Captures.
When you do, you would go over to these three little buttons around the side and click on them.
And below the Download Warp Diag is a new button here.
It says View Your Warp Diag. And boom, we have a cloudy summary. This is beta, of course, but it will give you an overview of the key things that we've noted as being events to take a closer look at.
I'll give you a summary of the findings.
And then below that summary is the count of the events that we've seen. So it'll vary for you, but we're adding new detection types all the time.
Some of the detection types you might see are things like, for a macOS device, is the Warp DNS server not being set?
Or are the Zoom split tunnel configurations failing in some way?
And then next to that, you would see the severity of what we've seen. It could be a critical issue, we think, that would really affect your connectivity.
Could just be a warning, hey, your Zoom split tunnels aren't exactly perfect.
Please take another look. And then you'll see some other detections where we're simply saying, we think that these are generally things to look for, but we're not seeing any problems here.
Below that section is a device and Warp details area where you can get more information about your device to help you with your troubleshooting journey.
So things like what operating system you're using, what service mode you are using, or what Warp client version you have downloaded on this device, what the profile ID is, things like that.
So say you click on one of the detection types here, Zoom split tunnel configuration, you can have a flyout come out with even more detail of that event.
And then it gives you recommendations based off of our community documents saying, hey, I think you could solve this issue with maybe just a quick upgrade to the latest Warp release.
Or you should go look at our troubleshooting documentation because we talk about this issue, we've seen this issue a few times, and we have some recommendations in there that you might want to take a look at.
And below that, you can even see the occurrences.
How many times has this happened on this device within the time span of this Warp DAG?
When you go to the JSON file tab, previously we were on the overview.
If you click one tab over, it's a JSON file. And that has a summary of the content for you.
So you could copy all this information, and it's the same as what the AI was showing you before.
It goes over the occurrences, the severity, the description of what the event might be.
You can copy that and save it for later if you want to take a look at this analysis again in the future.
You can also download your Warp DAG straight from here for further analysis offline.
Awesome. I think that's super cool. I really like the examples that you used too.
I know it's really common for people to have DNS issues, and I feel like if you're an IT admin, a security engineer, or a network engineer, you definitely had to troubleshoot one DNS issue or another at some point.
And the Zoom split tunnel example was really great too.
I know all the time customers are always complaining around, hey, I have XYZ video performance problems.
How do I get better insight into them?
How do I solve that? And I know having a good split tunnel configuration is a big part of that.
So it's really cool to see that it can identify some of the most common use cases that I see pop up in customer tickets all the time.
Yeah, yeah. Hopefully now when you get those random tickets from your users or just someone who's flying and they only have a moment to throw in a ticket and they say, hey, I'm having issues with Zoom, this can be a bit of a hint.
You can go and say, ah, don't worry, I've pulled your DIAG actually, and I see the issue and I'm going to go through those troubleshooting steps and solve it without having to wait for us.
Yeah, no, absolutely. I'm sure that's going to save customers a ton of time.
And I'm really excited that this feature is going to be generally available.
Yep. Just one of the fun ways we interact with Dex.
But Chris, I'm sure you have some other things here with Dex, especially for AI week.
Yeah, absolutely. And I think it's cool that we got to do a blog post together because I feel like a Warp DIAG has a lot to do with troubleshooting network connectivity.
Am I getting to Cloudflare's network in the first place?
Are there any basic things going on that are preventing my connection?
I feel like on the other hand, digital experience monitoring is really focused on performance.
Once I'm connected to Cloudflare's network, what does my performance look like?
Are my video meetings having any issues? Am I getting to that internal wiki in time?
When I'm messaging people on the Slack channel, am I getting my messages?
Why is an image taking so long to load? And I feel like Dex does a great job of answering a lot of those types of questions for customers.
I think one of the things that I really love about Dex is that in Dex, every single data point that that product is collecting about an employee's network performance, it's all available via Cloudflare's public APIs and via our logs.
And so there are lots of enterprise customers that will ingest this API data or this log data.
They'll build custom dashboards and do all these really cool things.
And it works out really well.
And that's kind of how you get the best digital experience monitoring experience when you're using Cloudflare One.
I think for other enterprise customers, it's actually really hard to be able to set aside the engineering time, the budget, the resources.
Maybe you just have a project time constraint, even. And it's really hard to actually set aside the time to build all of those data pipelines, build all those charts and graphs and all those internal tools.
And this AI tool that we're announcing for Dex today is actually going to make it a lot easier to be able to get all the valuable data out of the Dex API and out of Dex logs without having to build custom dashboards and spend all of that internal development time to be able to see the value of these metrics that we're collecting.
And that's why I'm really excited to announce the Dex MCP server.
And for those that aren't familiar, MCP stands for multi-context protocol.
And the whole idea of an MCP server is that it is a way for you to use an LLM, so like ChatGPT or Gemini or Cloud, and you can write questions to the Dex API.
And then the Dex API will actually answer those questions for you and show you all the data that they've collected.
So if you're an IT admin or network engineer and let's say I get a ticket from like Coco and she's complaining about her network performance and she says, hey, like, you know, my video call is super slow.
Why is that happening?
I can actually go into the Dex MCP server and say, hey, like Coco submitted a ticket.
Can you tell me what's going on? She's complaining about her performance.
And then the Dex MCP server, which is our new AI tool, will actually go and fetch all the data from the Dex API for Coco's device.
And then they'll give you a readout and a summary of everything that's going on with Coco's device.
And it'll actually analyze the metrics for you and kind of highlight some of the things that seem like they could be a red flag or could be causing any performance issues.
And, you know, I'm really excited about this.
I think performance troubleshooting in particular is just a really tricky thing to do.
It's super time consuming. And I think this is going to be, you know, a great feature that makes it significantly easier for customers to get to the bottom of different performance issues like sooner rather than later.
Yeah, sweet. And I love that example of the video performance issues.
I feel like we have very similar approaches to this. We've got the diagonalizer saying, hey, your split tunnels might be wrong.
And then you can go to Dex and look at this end to end completely and really find out where the problems may lay.
Yeah, absolutely. And I've got a little demo that I put together.
So let me share my screen and I can kind of give everyone an example of, you know, what it looks like to use this new Dex MCP server to troubleshoot like a performance problem with work.
Let me see here. So I've got my screen share.
It's all come up. So configuring Dex is actually really easy. So it's going to be a little bit different.
Chats GPT, Gemini and Cloud all have the ability to integrate with an MCP server.
And all you really have to do is copy and paste just like a little bit of configuration.
So if you go into developer and then you click edit config, all you have to do is copy and paste this like a little bit of JSON here where you define like the NPX command and then the arguments, which is like the MCP server that you're connecting to, which is Dex .mcp.Cloudflare.com.
And that's really all it takes. Once you do that, there's going to be a little short authentication process and then you're going to be able to jump in and start asking the MCP server questions.
So ahead of this, I put together a little demo and let me make sure I can pull up the correct conversation here where I, you know, went into our lab account and I actually asked Claude a couple of different questions about the Dex MCP server and got it to answer some questions about like a typical performance, you know, use case that you might see.
And cool. So it just loaded up and let me let me scroll to the top of this conversation.
So the first thing that I asked, I said, hey, are you connected to the Dex MCP server?
And it says, you know, yep, I'm all connected. I'm all good. Here's a list of all the things that you can ask me about, whether it's account management, test analysis, suite monitoring, or network diagnostics.
You can even ask it questions about warp diagnostics, which is really cool.
And you can actually ask it to use the AI summary of warp diagnostics, which is the, you know, feature that Coco already released.
So I love that our features are kind of tying in together here.
And so I asked it a couple of basic questions like, can you show me a list of all my Cloudflare accounts?
And then, you know, I asked it to like, kind of, you know, show me the list of all the devices in the Dex engineering production accounts to get it warmed up a little bit and show off some of the features.
And so here it says, hey, like, you've got, you know, one device that's connected, which is our like production lab account.
Then you've got these other devices that we've seen previously, but are disconnected right now.
So I started asking you questions, more questions about this Fedora VM device.
And from here, I kind of pretended that I was like a typical, you know, tier one IT admin, support engineer, network engineer, whatever you may call it.
And I said, hey, you know, I received a ticket that complained about the performance of the Fedora VM device.
Could you look into any performance metrics for this device and let me know what you find?
And so then the MCP server went out and started hitting all of these different Dex API endpoints to learn more about the Fedora VM device and its performance over the last 24 hours.
And so it's checking different Dex endpoints like our live fleet status, our fleet status over time.
It's looking at the list of Dex tests.
It's looking at the specific Dex tests that are applying to that particular device.
And it's kind of giving me a breakdown of all the data that it's pulling.
And then it starts giving me a performance analysis and it says, hey, so, you know, your overall device health is good.
It looks like you're connected. You're running the correct mode of warp. You're running an updated client version of warp.
So everything is good on that end.
On this particular device, we're running two different tests, one to google .com and then one to dec3.com, which is like a parched domain that we're using for testing.
And it says, hey, like, you know, google.com seems like everything is going well.
Here's, you know, your starting latency was 317 milliseconds and it dropped to 2 and 11 milliseconds by the evening, probably when there was like less traffic on the network.
And then it says, hey, you know, if this user is complaining about performance problems, is probably complaining about dec3.com.
It seems like, you know, the resolution time for this website is particularly high and you can see it ranges from, you know, anywhere from like one second to about two and a half seconds.
And it says that the DNS resolution is particularly slow. And it even kind of gives you some good benchmarks for DNS resolution, which I think is really cool.
So if you scroll down here, it says, hey, like, you know, the DNS response time for, you know, web, what was it, for dec3.com, you know, that it's 616 milliseconds on average, which is significantly high.
Typically, you should see a DNS query resolve in about 50 milliseconds.
And, you know, you're seeing almost 12x, like, increase in latency because of this DNS issue.
And so then you can, you know, ask it for some recommendations and it'll kind of tell you like, hey, like, if you were a network engineer, here's what I would do to solve this.
First, I would probably start with this, you know, DNS resolution performance bottleneck.
It also highlights, too, that maybe there are some end users, you know, outside of, you know, Coco or Josh or whoever it is that's reporting a problem.
Maybe they're seeing some increase in, like, HTTP 400 errors as well because of these DNS problems.
And so then I said, hey, you know, what are some of the things that I can do to improve DNS resolution time for this device?
And, you know, what do you suggest for next steps?
And so the first thing that it says is, hey, like, you should check your warp DNS configuration, you know, make sure that you have a performant and efficient DNS resolver, you know, make sure that you're using the correct, you know, Cloudflare policies, make sure that you're looking at any custom DNS filtering rules that might be causing any delays.
And then it goes into a couple of other solutions that tells me to check local DNS caching, which I think is also some great advice.
You know, if I have DNS caching implemented either, like, on Cloudflare's network or somewhere within my own local network, that's definitely the next thing that I want to see, particularly if I'm seeing, like, low DNS resolution.
There's a lot of, you know, employees that are going to DEC3.com.
Maybe there are some, like, cache configurations or some cache settings that I can update to be able to give, you know, a better user experience.
And then it has a couple of other recommendations here as well.
The thing that I thought was really cool, and this is kind of why I mentioned, you know, pulling data from a DEX API and, you know, being able to generate, like, custom charts and graphs off of that data, is an MCP server will give an LLM, like, Cloud or chat GPT, the ability to generate a chart or graph on DNS response times.
So I can actually scroll down and see, okay, cool, you know, I think I said it, thanks, you know, can you show me the graph of your average DNS response times for the FedoraDM device?
And then it'll actually create this chart and graph and show you, hey, you know, here is what your DNS response time looked like over the course of the last 24 hours.
Here's how that compared to your other DEX tests, like Google.com.
And then it'll even give you, like, some top level metrics, like your average DNS response time, your peak DNS response time, your best DNS response time, and your availability.
And then it'll highlight a couple of key performance insights.
Again, we're seeing your critical DNS performance problem 10 to 12x slower, and they're seeing a lot of variability, and then, you know, suggesting that there are some other things that we can do to address that.
That's awesome.
Those insights are really cool. Like, specifically, even just right there, peak problems, evenings, when you're having these issues, like, these are the type of things you just wouldn't be able to see.
That's awesome. Yeah, absolutely.
And I'm super excited about this. You know, I think all the time people are saying, like, hey, like, you know, we want to make, you know, our IT admins, our engineers as efficient as possible.
They already have so many different things to do.
They can never get all of their work done at the end of the day.
If you have any tools that can make their lives easier, help them do their jobs faster, it would be great to see stuff like that.
And I think this definitely falls into that category of, you know, making it easier for IT admins to, you know, solve people's problems at the end of the day and, you know, go home sooner rather than later and, you know, spend more time with their families or, and, you know, have more free time at the end of the day.
And, you know, I'm really excited about this feature.
Yeah, yeah, for sure. Okay, so just thinking back about all the things we talked about so far today, we talked about WarpDiag AI Analyzer, right, using that for troubleshooting your client itself to understand the network and connectivity issues that you might be having.
Then we talked about DEX and MCP Server there, using that to gather more detail, more insights into your network and issues you're having there, being able to interact with the server to drill in deeper and understand what's really going on.
I think we covered a lot of cool troubleshooting tools today, actually. Yeah, absolutely.
And I think the coolest thing about these features is anyone can actually try out these features for free.
So they're available to all of our customers, whether you're free or paygo or enterprise.
All you have to do is go into the dashboard and create a Zero Trust organization in the Cloudflare dashboard, and you can start testing out this stuff right away.
And, you know, I'm really proud that Cloudflare does a good job of offering really quality free features to our customers.
And I definitely encourage, you know, anyone that's around to go and test it out today, because I think these features will save everyone a ton of time.
Yeah.
And for the WarpDiag Analyzer, this is something net new that we're bringing into the dashboard.
People have been wanting to have these kinds of insights for sure.
So we welcome any feedback that you have while you're testing it out. Go in like Chris was saying, create an account if you already have one, see what we have available for detections.
Do know that more is coming down the pipeline. We're not just going to sit and forget this one.
So your feedback is really important right now while we build it out.
And then try out the DeX MCP server too. Chris, launch in.
Yeah. So the DeX MCP server, like I said, available to everyone and it's out now.
You can integrate it with chat, GPT, Cloud, Gemini, or any other major LLN that has support for model context protocol.
I think the other thing too is we're looking into building MCP servers for other products and making them available to our customers.
So this is just the first of many different AI features that we're going to be releasing in the near future.
I also know Coco too, we're working on doing a lot of other AI summaries for different security policies and other things like that too.
So I know there's a lot of really great work happening in the space and I can't wait to see all the other features that we're going to release in the near future.
Yeah. It's going to be a really cool set of new features, I'm sure.
Okay. Well, I think that might be it for us today, but it's been a cool conversation, Chris.
Yeah, absolutely. Thanks so much for your time, Coco.
I really appreciate it and I hope everyone that's watching has a great day.