Cloudflare Troubleshooting: Live and Unscripted
Presented by: Chris Scharff
Originally aired on September 16, 2021 @ 2:30 PM - 3:00 PM EDT
Follow along as Chris troubleshoots questions from the Cloudflare community using exciting tools like whois, curl and Chrome's Developer Tools.
English
Transcript (Beta)
Hi everyone, welcome to Cloudflare TV, and Cloudflare TV is presented by Chris Scharff.
Welcome to Cloudflare troubleshooting live and unscripted. My name is Chris Scharff.
I'm a solutions engineer here at Cloudflare. And today we're going to be doing some live troubleshooting for our friends in the community forums.
So for those who are new to Cloudflare, I want to let you know we have a number of support resources.
A great knowledge base. We have a service desk.
We also have a very vibrant community of folks where you can log in, ask questions, read a bunch of responses from folks who use Cloudflare every day.
And learn quite a bit about how they're utilizing their system in their sessions.
And so today what we're going to do is just walk through some questions that are currently open in the community.
And I'm going to walk through how I would debug these out loud so that you all can kind of see what goes through my thought process as a solutions engineer at Cloudflare.
And how you can think about troubleshooting issues for your environment or for your customers.
So with that, let's switch over to the first question, which is why is my website performance too slow?
So we have a customer who's given us a URL, which is great.
So at least we know what we're trying to troubleshoot.
And identified an example of what they think or where they think the problem is being demonstrated.
So a sample URL is always very helpful for understanding what's going on.
And I'm going to walk through this because there's a lot to actually unpack on this question.
I went and took a look at it earlier today.
Pulled up a few screens to start with. Looked through some of the information.
Pulled up some knowledge base articles. Looked through a few things. Looked at my own account to see where we could put some settings.
I'm going to save a spoiler for the end, which is really the first thing I should have checked in this particular instance for troubleshooting purposes.
But let's take a look. So they've got this website.
Google has a page speed insights tool, and they've run this.
And they believe that Cloudflare may have something to do with the performance issues that they're seeing.
So first, let's go ahead and take a look at the link that they sent.
And you can see that this site's got a 25 score. I think the average site scores somewhere around 48, 49.
So in general, websites tend not to be terribly well optimized for performance.
Certainly things that we can probably do from a 25. And there's one thing that I want to point out on this one initially, which is when I scroll down and look at this, we can see we're spending 1.44 seconds on this in multiple redirects.
And the reason for that is the first request that the user has done is for HTTP colon whack whack their website.
So if you're going to be testing page speed insights, probably the first thing that you want to do is make sure you're actually testing the URL that you're interested in.
So the way their site is working today is they're first calling this HTTP link.
Then I guess because it's a mobile device that it's being detected as, it's getting redirected to www.
Same site, but with a slash question mark n equals one. So that's a first redirect that's happening.
And then it's doing a second redirect here, which is not as clear, but you can see when I hover over it.
The first redirect was to HTTP, was to an HTTP link as well.
And then finally, it's being redirected to HTTPS.
So if we look at this again, we can see the original URL being called is HTTP colon whack whack and we get a score of a 25.
If I change this and I just do HTTPS, no other changes made to their site.
I'm running the exact same test, but I can see I get a 10 point gain by actually querying the right URL in terms of trying to determine performance.
And then if I say, or at least the right scheme. And then if I add the question mark n equals one here, I can see that my page score goes up to a 39.
So I've already gone from a 25 to a 39 just by asking the right question to begin with.
There are certainly things that you can do on Cloudflare's edge to take care of this redirection quicker than it appears to be happening for this site.
So you can see it appears that initial request comes through.
They're waiting 630 milliseconds for the redirect to the dub dub dub and then 810 milliseconds for the redirect to the HTTPS version.
So within Cloudflare, there's a couple of things that we can do on the SSL crypto tab.
There's enable automatically use HTTPS. You could also potentially use mobile redirects to automatically redirect mobile devices to the appropriate page.
So so what you know, that's a that's a pretty easy win.
We can also look here. We'll kind of walk through this in the order that these are showing up.
So there's a remove some unused JavaScript. Right. So there's calls here for YouTube, FCDN, a bunch of static resources that are calling for JavaScript that apparently is never being called on this page.
That is really a problem with the origin server and the HTML and source code that's being created on the page.
Could you set up Cloudflare to block all of these resources or rewrite your DOM to remove them from the page?
Absolutely. But this is much more of a if you write inefficient code, you gain inefficiencies.
Then we also have certain images and next gen formats.
So if you can see, there's a lot of PNG files here and some suggestions to do things like JPEG 2000, JPEG XR and WebP for optimization of size.
So Cloudflare can also help in this regard for our paid plans. We have an option called Polish, which can automatically convert images on the fly to WebP, reducing potentially reducing image size.
So if the image can be reduced in size and the browser supports it, things like Chrome and Firefox will automatically make that conversion for you if you've been able to feature.
And you can also enable lossy or loss compression to further reduce image size.
So if the customer was on a paid plan, they can make that choice to implement.
We also have considering lazy loading off screen and hidden images after critical resources have finished.
So since this appears to be a mobile site that we're testing here really because it's a mobile as opposed to desktop.
Cloudflare has a solution called Mirage, which enables lazy loading for images on load speed links and so customers that are on plans that support Mirage can implement that as well.
And that can also help with your images that aren't on the screen or below the fold, if you want to think of it that way.
Loading those after the rest of the site has loaded and optimizing for things that actually show up on that mobile device page.
We talked about the mobile redirect. There's removing some unused CSS. Again, these are coming from another resource.
So one, these aren't being served through Cloudflare.
Two, they don't appear to be being used on the site itself. So these can likely be removed from the underlying source code.
Then we have eliminate render blocking resources.
So here we're pulling in some fonts and These are render blocking.
So there's some things you can do to improve this. We've written a blog article on how you can use Cloudflare workers to make fonts first party workers.
You can also do some more things with things like Font Awesome as well to improve efficiency and then, you know, now we're starting to get down to some of the smaller things right like 0.15 seconds, 150 milliseconds.
Optimize images load faster.
So here you can see that they believe that Through some basic image optimization, you could reduce the size of these images and add potential Potential improvements as well.
And then there's some things that again just kind of come down to implementing fonts as first party workers or other other improvements to To make the pages render faster.
It also looks like, you know, I'm a little concerned that some of these fonts appear to be being called but may not actually be used on the page.
If there's, you know, not really any savings in terms of performance.
Reduce the impact of third party code. So again, if you're using Cloudflare and you're calling resources from external sites.
You want to be somewhat judicious or thoughtful about the tools and resources that you include on your page and think about how those may impact performance and look for ways to Reduce their impact.
So you can see YouTube here is adding quite a bit of thread blocking time Bloggers sending over a meg worth of content to this site.
So there's some opportunities for improvement there as well.
Passive listeners. This is really just a code change.
So that's, I have no idea what the actual impact of that is other than your lighthouse score.
Serve static assets with an efficient cash policy.
So you can find those 38 resources here that either aren't being cashed or or could use a more efficient cash policy.
And none of these are actually the same URL as what the customer was utilizing.
So we've got a bunch of resources here. That are being served from a third party site.
So if these were being served through Cloudflare and being cashed and potentially you can improve that.
But as long as you're calling third party URLs, you are dependent on their caching policies to control how well things are cashed.
Main thread work. This is script executions.
In script evaluation, you can use things like rocket motor. There's a whole lot of unused JavaScript on these pages.
So pulling that out to probably do a lot of work here.
I mean, this is three full seconds of script evaluation. A lot of other opportunities there potentially for for improvement as well JavaScript execution time Inside itself is pulling up to full to full seconds of CPU time calling a bunch of embedded players and other plugins from places again.
Think about what you really need to load on your page what you really need to learn on the main page and perhaps other places.
If you're trying to optimize for performance image super image heavy super asset heavy Sites are not going to be as efficient as sites that load for your assets.
If you go to www.google.com it's an incredibly light page it loads incredibly quickly.
The more stuff you add the longer it's going to take. And again, you'll notice.
I'm not quite sure why the customer thought Cloudflare was a blocking thing here, but all of these resources appear to be from third party sites.
And then we're going to look at this one here, which is enormous network payloads.
We're downloading this to mobile devices, but the page itself is almost three megabytes.
I've got a lot of gray hair.
I remember what a three megabyte site would take five minutes to load.
It's a bit large. I bet there's some opportunities for improvement here.
Some judicious thought about what assets do I really need to to load on the page and then we can see some chaining of requests here.
You know, this again is an opportunity I I'm reasonably certain at this point.
I know what what platform. This users using to publish their data.
There are when you use the third party platform to publish your content.
There are often Things that aren't as optimized for your site as well as they could be so you could take the opportunity to improve your sites.
Performance by either hand coding or looking for a lighter weight. CMS to manage your content.
So, you know, there we've got our, this was the initial query.
Let's just go ahead and I'm going to close this Let's go ahead and and This one.
See if there's anything here that's different. This looks pretty similar. And then we've got the third page here.
Yep, same, same type of thing.
But still, you know, look unused JavaScript on this page.
There's a lot of opportunities here. To improve performance. So that being said, When I finally step back and actually try to do some basic troubleshooting.
I think this customer actually has a more fundamental problem, which is they're not actually using Cloudflare So without exposing their origin IP address.
I'm just going to point out that their name servers today don't point to Cloudflare, nor does the website itself.
So none of these resources are coming through Cloudflare all the things I talked about it as Potentials for optimization, including using our CDN and being as close to the end user visitors as possible.
Just aren't available here because the site isn't behind Cloudflare. So if this particular poster was interested in putting their domain on Cloudflare they'd actually have to go through that process first.
There are certainly some things that Cloudflare can help with.
But ultimately, if you have a horribly inefficient site and I'm, you know, I didn't do any code analysis of the pages.
I haven't actually looked at it more of a general statement. There's only so much that anyone can do to improve the performance of your site.
Sometimes Cloudflare seems like magic, but it is it is not.
It's not a silver bullet. There's always going to be opportunities for optimization in the source code and in your site itself.
So with that, I think we've kind of exhaustively walked through that one, but it did cover a ton of A ton of areas that I think we often see in the community where users are having problems and kind of just walking through how to determine what the cause there might be.
So the next one is just more of a near and dear issue to my heart.
I'm going to go ahead and vote for it, which is they'd like a domain configuration profile template.
So they can sync settings between domains.
The great thing is Cloudflare actually has this already. It's called Terraform.
We have a CF dash Terraforming library. In GitHub that you can use to export your default configuration from a newly added zone.
So you can create that factory default settings.
You can get the Cloudflare default settings and then you can create a factory default, which would include whatever optimizations you have and then you can apply that every new zone as you add it to Cloudflare and make sure that you have a consistent and synchronized set of Features and settings that you apply across all the zones in your environment.
Great thing about Terraform, which is made by my friends at HashiCorp is that You can then manage your Terraform config file in a GitHub repo or whatever version control system you happen to be using And then you can manage infrastructure as code.
So if you make a change to that factory default settings and you want to go back and reapply that to all of your existing zones in Cloudflare, you can utilize Terraform to do that.
Great tool. I was talking to one of my friends at HashiCorp over the weekend.
He says that, you know, definitely moving towards Terraform Cloud. I haven't played with that particular tool set yet, but I'm very much looking forward to it, but that's really the From my perspective, the easiest way to manage your config settings at Cloudflare to be able to apply it in the way that this particular person's asked.
So for those that are interested Terraform I think is the answer.
And I've got the vote along here. By the way, using a personal account here with no extra permissions for any of this.
So I'm looking at the same things anybody else would see When they when they log in.
No special tricks, knobs or dials. Cash what lease.
So I looked at this cash or this question earlier, Michael. Who's one of the MVPs at Cloudflare.
So a volunteer in the community being recognized for his efforts.
Thank you, Michael asked Asked some great questions here.
In this, in this case, he's trying to verify the customers actually Seeing cash headers from Cloudflare.
So we can see the age header here which Cloudflare is adding we can see the CF cash status of hit.
So that seems to indicate, yes, it is coming through.
Michael Pointed out that, you know, potentially, depending on how the content is being cashed on your rules.
This could This could be different. And I think, you know, in this case, Michael probably has this right, the user may be generating some type of custom cash key for these objects.
I think short term, the right answer might be to.
Well, there's a couple of possibilities here because I don't know the domain that's being utilized.
It is possible that this content is being processed by a third party partner or provider.
So we have an SSL for SAS Option at Cloudflare, it's entirely possible that this customer is pointing to a third party that is cashing their content.
And while they may have their domain on Cloudflare that particular host name could be Be controlled by that third party.
And so nothing they do is going to override the settings that they delegated to that third party.
So that's one option. The other option here is that they are using some type of a custom cash key for their environment.
And so based on that, you'll have to make sure that you're identifying the correct object to be Purged based on the cash key itself.
So for this, I guess the way to check to make sure that it's not potentially a third party hosting the content and that it is likely a custom cash key error would be to do a global cash purge.
So a cash purge everything within the zone to ensure that the next request is a miss.
If you are on a plan that supports purge by host name, you could also do it that way as well and purge everything associated with that host host name, as opposed to doing global cash.
But absent Cloudflare status message that there's a caching issue. My guess is here is what they're trying to cash looks like what we've actually got cash, but there's a scheme difference here.
That's that's taking effect. So those would be my, my kind of next set of troubleshooting recommendations.
Next question. Which is users asking about cash everything inside of a paid account.
But then they're talking about cash purge.
So, It's a bit confusing here. You can purge everything which or you can purge by individual host name, but a cash everything setting is actually controlled within your account under caching.
So they were on the same page here.
I got a configuration I and I see I can I can purge my content right individual host name hosting on an enterprise plan by cash tag also on enterprise plan.
But my cash everything level I can basically go to my page rules. So I can go here.
If what I want to do is cash everything on a particular path. So let's say I want to do the entire site.
So star. In this case, Let's just say test slash star.
So this will include everything on the test path. I can go to cash level.
And I can say cash everything right. So if I want to enable a global cash that's done via page rule.
So I think that's might be what this user was actually looking for.
Sometimes I guess the challenge people run into is they're trying to solve the problem or figure out where to solve the problem without actually clearly stating what the what what the problem itself is.
And so in this particular scenario.
I believe the question is, how do I enable or how do I set cloud for to cash everything.
And so it's a little confusing here by talking about cash purge.
But anyway, short, short answer is page rules. So hopefully that helps out And let's see.
Let's move on to the next question. This is this is less of a troubleshooting question.
More of a why did cloud third choose this option. So the customer here is interested in access, which is our Zero Trust security product.
And access allows you to set up identity providers to be able to authenticate your when accessing resources in your domain.
So if I wanted to set up a rule for my WordPress admin console, for example, I can set up a rule.
Access policy that said, let's say WP admin.
I'm going to create the path of WP dash admin slash By the way, there's some exceptions that you probably want to put in here in terms of exclusions.
I might have covered here.
I think there's a knowledge base article on it. I'm going to create a default policy allow and I'm going to include those whose email addresses and Right.
So this now what this would say is anybody who comes to I can meet calm and I could put an asterisk here if I wanted to apply to all my zones and tries to go to the WP admin site, they're going to be Prompted for credentials to be able to access it before a cloud fire will let them through to the origin website.
And what this user was asking is cloud floor provides a one time pin option, but no option to use your Cloudflare account to access and instead we integrate with a number of identity providers.
And the answer for this is actually pretty straightforward cloud floor is not an identity provider.
We have not built an identity provider.
There are literally thousands of identity providers out there. If you go to an RSA conference.
At least 100 identity providers out there with boosts that are that are demoing their wares.
And so, given that there are so many folks that are great at providing identity out there.
We made it really easy to integrate with them.
So if they support IDC or SAML or off. You can configure this.
We've got some easy to use kind of preset wizards for things like octa and one login as well as assure a D that you can enable you can even enable Facebook for this if you wanted for visitors.
All of those are author SAML endpoints those applications were designed to provide authentication for people that don't have an identity provider or can't integrate with an identity provider because their users may become from multiple places.
We have a one time pin option which will basically send the user an email and allow them to log in.
And, you know, I guess my my answer to why can't I use my Cloudflare account to log in to access resources that I put behind Cloudflare.
Is the authentication scheme that we built into log into Cloudflare is an application authentication scheme.
It's not an identity authentication scheme we we don't provide an OAuth or SAML endpoint and Cloudflare.com And dash Cloudflare.com was not really meant to be an identity provider.
It's possible Cloudflare could build an identity provider. At some point, I have no idea if we did.
You'd be able to hook even if we let you create an account in the identity provider and use that to log into Cloudflare the Cloudflare application was designed to provide access control and authentication for the application itself so Short answer is, if you want to use an identity provider and not use one time pins.
There are plenty of them out there Cloudflare hasn't built one so Just wanted to mention that one.
Next question. I've got four minutes left. So I'm going to kind of go through this one relatively quickly, which is somebody was wanting to build a workers live video stream.
Where they use it for token. And service to the user.
And they link to An article from Y Combinator where somebody was using Workers to really store images in a third party place and do an image rewrite to serve up their content.
And I just want to touch on this briefly one Cloudflare has a paid paid video services we have Cloudflare stream and streaming CDN The question is not really whether or not the content should be cached on Cloudflare's edge or not.
It's whether or not to be served to Cloudflare based on our terms of service.
I'm not part of trust and safety. But what I would say our terms of service are pretty clear if it's a disproportionate Percentage of traffic that's being a non website traffic that's being served.
It's in violation of the terms of service.
I think that being said, there's a very different between I'm building a Seminar that's going to allow 25 people to log in for live streaming of this event and I want to stream an event to 20,000 or 200,000 people One of those is likely going to come to the attention of of our trust and safety team.
The other one probably won't That being said, you know, we have an enterprise sales team, you're welcome to reach out to them and talk about that.
That's not really something that specifically can be answered in the in the community.
And then Last one here is error DNS points to a prohibited IP and you can see in this case the IP addresses that are here.
I've seen this more and more lately, which is customers are adding zones to Cloudflare And in the process of adding that the zone was already on Cloudflare it already had orange clouded records and they're adding the zone again to move into a different accounts.
Cloud has a tool that scans for DNS records and returns those results and pops them into the cloud for dashboard to try to make it easy to migrate From a different platform to Cloudflare, but we won't look behind The orange cloud settings to tell you what the IP addresses are of another site that is on Cloudflare And so that's what's happened in this place is we basically scan for the records, the records that came back with Cloudflare IP addresses, we plug them in happily.
But those aren't the true origin IPs.
So part of the bringing your zone on to Cloudflare is making sure that all the DNS records you need are there because one, our scanning tool won't necessarily Get everything and then to making sure they point to the right place.
And in this case, the right place would be the origin server not Cloudflare's IP address.
So that's Just a tip there.
And then last one was somebody that wanted to block IP addresses and scan for EMV files.
In theory, one can do this with a Cloudflare worker.
I think The other place to do this relatively straightforward in a straightforward fashion is to use fail to ban, which is a third party tool.
There's some integrations where people have already written that automatically connect Cloudflare Lots of options.
So with that, we're running short on time. I want to thank everybody who joined.
If you haven't checked out the Cloudflare community. If you're not hanging out here answering questions or asking questions.
Please feel free to join us.
And with that, please stay tuned. Matthew has a special guest on shortly and I for one, I'm looking forward to it.
So thanks again for joining.