ℹ️ Digital Experience Monitoring
Presented by: Abe Carryl, Kyle Krum, Michael Keane
Originally aired on December 17, 2023 @ 5:30 PM - 6:00 PM EST
Welcome to Cloudflare CIO Week 2023!
This CIO Week we’ll demonstrate how Cloudflare is helping CIOs keep data, devices and employees both safe and fast across hybrid and remote environments. We’ll show how Cloudflare accelerates digital transformation and modernizes networking and security towards a Zero Trust model.
In this episode, tune in for a conversation with Cloudflare's Abe Carryl, Kyle Krum, and Michael Keane.
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
For more, don't miss the Cloudflare CIO Week Hub
English
CIO Week
Transcript (Beta)
All right, everybody, welcome back to Cloudflare TV and I hope you're enjoying CIO Week so far.
This week, we're announcing a bunch of new products and capabilities and partnerships that help CIOs do their job as they're modernizing their IT stack towards the future or just surviving day to day and keeping the lights on for their organizations.
I'm Michael Keenan. I'm on our Zero Trust team here at Cloudflare and joined by two of our product managers, also on the Zero Trust team.
Throughout this segment, feel free to write in any questions as a quick reminder to live studio at Cloudflare TV.
As we talk about this new product we're introducing called Digital Experience Monitoring,.
And digital Experience Monitoring, just as a little preview, is part of our full Zero Trust platform, which has lots of different composable components to it.
As organizations are modernizing their security so commonly, that could be a variety of use cases with securing their access and our new hybrid work world, whether they're tackling their VPN or their blocking down contract or access or developer access, whatever they're doing with their Zero Trust network access, or they're defending against threats on the open Internet, helping to prevent phishing and ransomware and isolating a remote browser if they're securing their SaaS apps like Google or Microsoft, so many different use cases to tackle.
And sometimes the hardest part is figuring out where to start and what order to do them in.
But I think regardless, if you're looking at threat defense or access or SaaS or whatever.
Visibility is a huge component throughout of how do you maintain visibility and control as you navigate this modernization and as you kind of navigate these various use cases and moving to the future.
So, visibility is probably the single word I would use to describe this new granular troubleshooting-esque product and capability that we're trying to introduce to the platform and boost.
So, Kyle, I know you have worked on this quite a bit with Zero Trust connectivity here at Cloudflare.
So how about you introduce your role at Cloudflare and talk a little bit more about what digital experience monitoring really is?
Thanks, Michael.
So my name is Kyle Krum. I am the group product manager at Cloudflare that is responsible for the Warp agent and connecting all devices, whether it's through the Warp agent or Cloudflare or Tunnel or third party services to our service that powers the rest of the Zero Trust scenario so that when you want to deploy a gateway DLP policy or Browser Isolation or anything like that, the first step is onboarding your device traffic or onboarding your network traffic.
And that's what our team is responsible for.
However, once you onboard that traffic, usually that's not the last thing in.
They're the last.
The next thing that you need to worry about is our users able to debug and have successful connections with that traffic, that power your sites and services.
And that's what digital experience monitoring is about. Digital experience monitoring from a Cloudflare perspective is going to be focused on four things.
The first one is synthetic application monitoring.
So this is the ability to define tests or monitor endpoints that you and your users care about, whether that's third party services from Microsoft or Zoom or CrowdStrike or new sites that they use that they go to or anything like that.
The next is real time monitoring of where users are going.
So how is their active Internet connections?
How is the Tunnel connection?
Is it smooth?
Are there disconnections or are they being rerouted?
Kind of in strange different ways.
And then the last part of all of this is being able to visualize and actually dig in to charts and graphs and data with your networking team, with our networking team, with a third party networking team, and actually figure out where the problem is.
The best way I like to describe the scenario is back in the olden days when somebody from your company called up the IT department and said, Hey, I can't connect to a meeting right now, or my network connection is really slow.
You probably send somebody down to some closet somewhere.
You start doing low level debugging on routers.
You'd have control of all of the network traffic from the employees device to what they were connecting to.
And you could go figure out it's a switch that went wrong. It's a firmware update that went bad, whatever it happens to be.
In modern times, especially with employees working from home or remotely, you have no idea right now where that problem is.
Is it their device that's in their office?
Is it their internal home network?
Is it between their network and Cloudflare?
Is it in Cloudflare?
Is it from Cloudflare to the third party service that you're relying on?
Or our connection to some other network infrastructure debugging and figuring out those problems, especially with a remote workforce, is really, really hard.
Our digital experience monitoring product is meant to solve those challenges.
So I'm going to toss it over to Abe Carryl.
Abe, if you can introduce yourself and talk about exactly what we're announcing today and delivering.
Yeah, sure. Hi, everyone.
My name is Abe Carryl and I'm the product manager for Digital Experience Monitoring.
So super happy to be making this announcement today and to talk through some of the features that we're going to launch.
So I love the way that Kyle kind of introduced the space and kind of the three different areas that you can focus on around real user monitoring.
So real world scenarios, what is the user's experience look like on a day to day basis?
Synthetic application monitoring, where we can test things like theoretically, what would performance look like at different intervals throughout the day and then moving to things like network path visualization and UCAS monitoring in the future.
So today, on that note, we launched or we announced the three features that we're going to be moving into closed beta this quarter.
The first is called Warp Connectivity Status.
So the way that you can really think about that is, deploying client software is hard and having visibility into who's using that software, how far it's deployed, what state it's in, is really critical to understanding the overall connection and security of your of your device fleet in general.
So one of the things that we're going to do there is we're going to be giving users the ability to understand how many users in their fleet are running Warp today, if they've disabled, paused, if they're connecting, if they're reconnecting, what kind of state they live in, as well as where they're connecting to Cloudflare through, that can already give you really, really cool intelligence to understanding.
Let's say that you're an organization that only wants your users connecting in a given country.
You may be able to easily identify the users are connecting from outside that region.
You may see unintuitive network routing paths, things like that. So we're really, really excited to be announcing that feature.
The second is called Synthetic Application Monitoring.
So with this you'll be able to set up a traceroute request and HDB gets to public and private endpoints that you want to run these tested at a given interval or cadence set.
So you'll be able to test things to tenants SAS application dot com or to your own private wiki spaces.
Different tools like that to understand what the given experience looks like on a day to day basis.
And then the feature that will kind of build on top of that is called Network Path Visualization.
And that's one that I'm really excited for because I think that both from a from an enterprise perspective but also from a personal standpoint, I think that's something that can be really cool for all users.
A lot of networking can feel abstract at times, and it's very hard to visualize and to learn how your traffic is getting routed.
So understanding not only what the end-to-end performance looks like, but also the individual hops that a request may take to get to that application will be something that's really cool.
So you'll be able to drill in, filter by user, by device, and then understand the path that a given a request takes out to the Internet or to a private facing application.
So those are some of the announcements that we're making today that we'll be launching this quarter that we're really excited for.
That more comprehensive view is going to be really exciting.
Otherwise, from a troubleshooting perspective, it kind of feels to guess in check of are we looking at the device is a problem with the ISP?
Is this SaaS app just down right now?
In which case we just wasted a bunch of time troubleshooting individual devices?
Speaking of the troubleshooting in individual devices, so things like CPU and memory, how does that work from an implementation standpoint?
Is it the same agent everyone already has?
Do we have a new agent?
Yeah.
Great question. So this will all run on the same device agent that everybody knows and loves.
Cloudflare Warp. And so, no, no additional tooling required here, no additional deployments needed.
This will be built into the Cloudflare Warp agent. If you're already running that, you'll get this for free.
You don't have to add any extra wiring or tubing anywhere, which will be really cool.
And to your point around troubleshooting, I think anyone who has spent time troubleshooting it at really any level will be able to empathize with those really difficult questions that you get in ticket.
Sometimes the ones like, Why can't the CEO reach SharePoint while they're traveling abroad?
There are so many questions to Kyle's point between the old world of, well, let's go down to the network closet and let's figure out what switches have blinking lights and which ones don't to now having really limited visibility into into what's happening on the Internet.
So what we want to do with digital experience monitoring is we really want to turn the lights on and we want to give you the visibility that you need to easily identify those problems to figure out.
To your point, is it something where there's a low, low WiFi signal strength in the hotel?
Is there an uptick in RAM usage because, you know, you're streaming a movie at the exact same time that you're on a that you're on a customer call?
Is it that the that the SaaS application is having an outage? How can we also give you tools and alert you proactively to where you can get ahead of the flood of emails that are destined for your inbox when those things do occur?
So you can proactively send a notice out to the company and say, Hey, we're aware of an issue with XYZ application.
You know, bear with us while we work with that vendor to figure out when they're going to be back online.
Those are the things that are just quality of life improvements that hopefully make it not only easier to troubleshoot those issues, but get us out of the game of kind of finger pointing up and down stream at each other to file tickets and then hoping that somebody kind of stumbles on the smoking gun, so to say so.
So yeah, those are the scenarios that we're trying to solve with the solution.
And it sounds like to avoid the finger pointing throughout the troubleshooting, the end goal, which always sounds great, is seeing that full end to end performance.
But I imagine folks tuning in are probably familiar with feeling a little bit, I guess, in check or a little bit piecemeal or maybe using multiple tools to accomplish all the things that this new product is going to ultimately try and do.
So it all sounds great, but can you maybe speak to what is the agent actually doing under the hood?
What techniques is it using to monitor all this performance at first, middle, last mile?
Yeah.
Yeah. Good question. So, out of the gate, we'll be offering two different kinds of tests.
So the first will be a traceroute, which will help you visualize kind of that hop latency that you may experience on your way to a given destination, whether that's public or private.
The second will be HTTP GETs.
And, again, both of those will be running from the device agent will run them at an interval that you define.
And we'll kind of automate that so that we can start to give those reports and start to show what the network path looks like.
What we expect is that should give you the visibility into understanding at the very least where the latency is occurring and tackle the very first part of the problem, which is identifying is the issue occurring in first, middle or last mile.
So yeah, those are some of the methods that we'll be releasing early on with this feature.
And it makes sense for individual devices and installing the agent.
But speaking of hybrid work and all types of larger organizations, how does this work for branch or remote offices?
Yeah, good question.
So this will this tool will work anywhere that you can run Warp.
So you can define the devices that you'd want this test run from. If you deploy Warp to a server in a branch office, then you can run the test from there as well.
You can target those individual machines that you want to run these tests.
So yeah, anywhere that you can run Warp, you can run these tests and you can kind of simulate what, what the experience would look like from that given location, which would be really powerful.
And again, it helps.
We think it will help a lot with users who are not just working from offices, but working from new locations on a daily basis.
Even if we were able to test every individual employee's home network and WiFi connection.
And we knew for a fact that everybody had 400 up and 200 down or something like that, we'd still wouldn't know where they're moving on a day to day basis.
So the ability to kind of follow those users real time should be really impactful.
And thinking of the three capabilities you mentioned that we're announcing today, the Warp, the fleet status makes a ton of sense of how that all runs from orbit and the visibility features that we offer from synthetic application monitoring when we're detecting kind of like mimicking users to test like, hey, is this is this SaaS app even online right now?
Should we be even wasting time on troubleshooting devices?
How does that tool work?
Is it hosted the same way?
Is it from devices?
Does it launch from a Cloudflare pop?
How does that tool work?
Yeah, good question.
So. So it'll run from the these tests will always run from the Warp agent itself.
So whatever device you have deployed to, we'll be able to run that.
You'll configure these tests through the Zero Trust dashboard.
So we'll be building some, some new UI components there which will allow you to build those.
You can, of course, build them from the API and we'll have automation available as well if you want to do it more programmatically.
But yeah, so the same, the same kind of single pane of glass that you used today to, to manage your remote access rules and your secure web gateway rules and your Browser Isolation policies, you'll set it up in the exact same way and be able to kind of have that granularity to drill into the individual test and machines that you want to test against.
And we'll kind of roll things up in both an organizational and per user and per device level.
So you have kind of complete insight into what the overall health of an application looks like in your organization.
But then when you do get those reports for individual users where the first thing that we often think is how many users are being impacted by this issue, is it just this one user?
And we generally do that kind of as a proxy to understand, is this a global issue or is this a per user issue?
Because that will probably tell us she would be checking on last or first mile.
So the ability to see an organizational level is this last mile or at a user level of cases, first mile, we can start to immediately pinpoint where we want to triage these issues.
So, yeah, I hope that answers a little bit of your question.
I'll jump in a little bit here, too.
I think when we were going through the design of our digital experience monitoring product, one of the things we talked about are like, what are the core scenarios or problems that we want to solve?
And I'd say the two ones that sort of bubbled up to the top are The first is somebody calls up and says, Hey, I'm having a problem today with this SaaS app.
Can you help me debug it?
We want to give the tools to your i.t. Department, to our department, to the it department of the SaaS app that you're working with to be able to figure out where that issue is so that we can resolve it.
Like it's not a finger pointing. It is.
Let's get you back and healthy where we all have the same customers.
We all kind of want to work together and solve the problem when somebody gets on the phone with you.
The next scenario that we want to solve is the alerting so that you don't have to get somebody calling you to tell you that there's a problem.
I think I mentioned this already.
We want to be able to alert you if we notice all of your users connecting to some SAS app from a particular location or starting to generate 500 errors on their TV, get requests or their Traceroute starts showing that connectivity has significantly degraded.
We want to tell you that stuff ahead of time so that you can send a slack message or whoever you communicate with the users like, Hey, yes, we know that there's a problem with X, we're working on solving it.
Those are kind of the two big scenarios that we're working to tackle.
And speaking of the design process for the product, I'm sure there's some current users of Zero Trust tuning in familiar with using Access and Gateway and CASB and all these different parts of the platform.
Is this another dashboard that's in one spot within the UI?
Is this going to be located in multiple places in the product or are there dependencies or we're talking about another new part to the platform which continues to expand.
As we serve these new use cases we are discovering in everyone's modernization journey.
So how does this work?
Is it a new part of the platform? Is it everywhere?
Is it intertwined? How should we think about it?
Yeah, Yeah, good question.
So this will be an integrated part of the existing dashboard.
So it will live in the Zero Trust dashboard and we will have kind of a roundup views that you can go to drill into just the digital experience monitoring analytics that you'll be looking for.
But what we really want it to be is an integrated experience throughout the entire platform so that hopefully you don't have to go to just one area of the dashboard.
Anywhere that you're looking within the dashboard and your users or your devices or your deployment and networks, you want to be able to just drill directly into to where you are natively within the dashboard and find the find the answers to the questions that you're looking for.
So we'll absolutely have kind of a dedicated section within the Zero Trust dashboard that points you to these aggregated metrics.
But yeah, we want to add deeper analytics and visibility to the entire dashboard experience.
I'm sure folks are thinking of just other performance monitoring tools that they're familiar with.
There you might be thinking of Catchpoint, 1000 eyes. Why would you say it makes sense for Cloudflare to be getting into this space?
Yeah, good question.
And Kyle, you know, I'd love to hear your perspective as well.
But I think that what I loved about what Kyle mentioned earlier is kind of the shift to hybrid work models in general being kind of a driver for this.
You know, up until we started really adopting Zero Trust, there probably wasn't a major need for this in the current state that it lives in.
So I think that that's been kind of a big driver.
So we're really excited to meet customers where they're at on that front.
I would say that some of the unique things that Cloudflare will be able to bring to the table is just our network intelligence in general.
275 data centers across the globe, 22 million DNS requests every second, 3039 HTTP requests, billions of unique IP addresses connecting to our network.
So I think that network intelligence in general should give us a great view.
And then, of course, just being that close to end, users should be able to give us the telemetry data that you're looking for to help show not only perform an experience, but help prove that out as well.
And I think something no other zero trust vendor can relate to as well as a cool thing about Cloudflare being in so many different spaces to be in serving Zero Trust and network services, but to also have application security and application performance services.
Some of these various sites or SaaS apps we are serving data on here or protecting behind Zero Trust.
We're also are also running behind Cloudflare for their application services and at Layer 7.
And so with such a unique perspective and unique position we have that it just makes so much sense that we would be in a good position to serve metrics on, Hey, are those sites up right now or serve latency metrics or just be able to kind of work within ourselves internally to figure out how do we bring all this cool stuff we do together?
So I really like that. Yeah.
I'd also add that the last thing that we when we talk to IT administrators, CTOs, the last thing they want to do is deploy yet again another agent or yet again another service, especially when, when you're using our DLP solution or our gateway stuff, all of the data is already there.
We are the closest to all of this information.
So that's why we also think that we're the best position to give you the real information about what's going on in a device and help you understand what the experience of your users is.
Awesome.
So maybe just very last thing for folks that are listening and think it all sounds really cool.
They can get their hands on it pretty soon. Or what are we talking timeline wise?
Yeah.
Good.
So, today we publish a blog post. So if you go to blog.cloudflare.com, the most recent blog post kind of featured blog post for today you'll be able to find we have a sign up list in there to kind of be on the wait list and to receive early access to the closed beta, which we'll be doing this quarter.
So as always, we're talking days and weeks here, not months and years.
So we're super excited to get this out in a closed beta state and to get users testing this and get early feedback.
The feedback will be obviously the most important thing.
So we're excited to not only offer this to enterprises in our closed beta, but as always to put this on our free plan as well so that any user can get started with it.
One of the things that I'm really excited about is the potential for this, for the solution again, for our for our free users as well, who can test this out for, for up to 50 users, a certain amount of test to kind of get started with which I think will again for whether you're just learning how to do home lab networking or whether you're an enterprise scale, you should be able to use this feature easily.
So we're really excited to get this into users hands within the next few weeks.
Yeah, it's really cool to see how far our Zero Trust platform has come recently and where it's going to continue to grow in some of these spaces.
So thanks again, Kyle and everyone watching lots more Cloudflare TV's segments to tune in this week, so hope you enjoy CIO Week.
be.
The real privilege of working at Mozilla is that we're a mission driven organization.
And what that means is that before we do things, we ask what's good for the users as opposed to what's going to make the most money.
Mozilla's values are similar to Cloudflare's.
They care about enabling the web for everybody in a way that is secure, in a way that is private and in a way that is trustworthy.
We've been collaborating on improving the protocols that help secure connections between browsers and websites.
Mozilla and Cloudflare collaborate on a wide range of technologies.
The first place we really collaborated with the new TLS 1.3 protocol and then we followed it up with quick and DNS server HTTPS and most recently the new Firefox private network.
Dns is core to the way that everything on the internet works.
It's a very old protocol and it's also in plain text, meaning that it's not encrypted.
And this is something that a lot of people don't realize. You can be using SSL and connecting securely to websites, but your DNS traffic may still be unencrypted.
When Mozilla was looking for a partner for providing encrypted DNS, Cloudflare was a natural fit.
The idea was that Cloudflare would run the server piece of it and Mozilla run the client piece of it.
And the consequence would be that we'd protect DNS traffic for anybody who used Firefox.
Cloudflare was a great partner with this because they were really willing early on to implement the protocol, stand up a trusted recursive resolver and create this experience for users.
They were strong supporters of it.
One of the great things about working with Cloudflare is their engineers are crazy fast, so the time between we decide to do something and we write down the barest protocol sketch and they have it running in their infrastructure is a matter of days to weeks, not a matter of months to years.
There's a difference between standing up a service that one person can use or ten people can use, and a service that everybody on the Internet can use.
When we talk about bringing new protocols to the Web, we're talking about bringing it not to millions, not to tens of millions.
We're talking about hundreds of millions to billions of people.
Cloudflare has been an amazing partner in the privacy front.
They've been willing to be extremely transparent about the data that they are collecting and why they're using it.
And they've also been willing to throw those logs away.
Really, users are getting two classes of benefits out of our partnership with Cloudflare.
The first is direct benefits. That is, we're offering services to the user that make them more secure and we're offering them via Cloudflare.
So that's like an immediate benefit that users are getting.
The indirect benefit that users are getting is that we're developing the next generation of security and privacy technology and Cloudflare is helping us do it, and that will ultimately benefit every user, both Firefox users and every user.
The Internet. We're really excited to work with an organization like Mozilla that is aligned with the user's interests and in taking the Internet and moving it in a direction that is more private, more secure, and is aligned with what we think the Internet should be.
Hi.
We are Cloudflare. We're building one of the world's largest global cloud networks to help make the Internet faster, more secure, and more reliable.
Meet our customer Bookmyshow.
They've become India's largest ticketing platform, thanks to its commitment to the customer experience and technological innovation.
We are primarily a ticketing company.
The numbers are really big.
We have more than 60 million customers who are registered with us.
Around 5 billion screen views every month, 200 million tickets.
Over the year, we think about what is the best for the customer.
We do not handle customers experience well, then they are not going to come back again.
And Bookmyshow is all about providing that experience. As Bookmyshow grew, so did the security threats it faced.
That's when it turned to Cloudflare.
From a security point of view, We use more or less all the products and features that Cloudflare has Cloudflare to replace the first level of defense for us.
One of the most interesting aha moments was when we actually got a dose and we were seeing traffic posts to up to 50 gigabits per second, 50 GB per second.
Usually we would go into panic mode and get downtime, but then all you got was an alert and then we just checked it out and then we didn't have do anything.
We just sat there and looked at the traffic peak and then being told it just took less than a minute for Cloudflare to kind of start blocking that traffic.
Without Cloudflare, we wouldn't have been able to easily manage this because even our data center level, that's the kind of pipe, you know, is not easily available.
We started for Cloudflare Security, and I think that was the moment we actually get more sleep now, because a lot of a lot of the operational overhead is reduced.
With the attack safely mitigated.
Bookmyshow found more ways to harness Cloudflare for better security performance and operational efficiency.
Once we came on board on the platform, we started seeing the advantage of the other functionalities and features.
It was really, really easy to implement a strategy too.
When we decided to move towards that Cloudflare Workers, which is the computing at the edge, we can move that business logic that we have written custom for our applications at the Cloudflare Edge level.
One of the most interesting things we liked about Cloudflare was everything can be done by the API, which makes almost zero manual work.
That helped my team a lot because they don't really have to worry about what they're running because they can see they can run the test and then they know they're not going to break anything.
Our teams have been able to manage Cloudflare on their own for more or less anything and everything.
Cloudflare also empowers Bookmyshow to manage its traffic across a complex, highly performant global infrastructure.
We are running on not only hybrid, we are running on hybrid and multi cloud strategy.
Cloudflare is the entry point for our customers. Whether it is a cloud in the back end or it is our own data center in the back end, Cloudflare is always the first point of contact.
We do load balancing as well as we have multiple data centers running.
Data center selection happens on Cloudflare.
It also gives us fine grained control on how much traffic we can push to each data center depending upon what is happening in that data center and what is the capacity of the data center.
We believe that our applications and our data centers should be closest to the customers.
Cloudflare just provides us the right tools to do that. With Cloudflare, Bookmyshow has been able to improve its security performance, reliability and operational efficiency with customers like Bookmyshow and over 20 million other domains that trust Cloudflare with their security and performance we're making the Internet fast, secure and reliable for everyone.
Cloudflare.
Helping Build a Better Internet.