Cloudflare at Cloudflare
Come learn how we use Cloudflare technologies internally to solve problems (or as we say "dogfood our own products" internally).
Hello everyone. After a couple of weeks hiatus we're back on Cloudflare at Cloudflare.
As you know this is a program where we talk about dogfooding and I'm your host Juan Rodriguez.
I'm the CIO of Cloudflare. For some of you that may not have watched the show before dogfooding is a term that we use in Cloudflare to refer that we try to eat our own dogfood.
We drink our own champagne sometimes how people say.
So we use Cloudflare technology to solve Cloudflare problems or things that we may have to do with our products or services or IP or things like that.
So I also bring a guest or two. So today I have Mr. John Fawcett with me. Hey John how are you?
I'm great how are you Juan? I'm all right not bad for a Thursday.
So I'll let you introduce yourself. Why don't you tell us maybe like you know who you are what you do how long you've been at Cloudflare.
Sure yeah so I'm John.
I'm the engineering manager for the marketing engineering team. So basically we control Cloudflare.com www .Cloudflare.com.
We are actually one of Cloudflare's biggest internal customers but speaking of internal customers marketing is our biggest internal customer.
So we're always just you know weighing the needs of marketing with the needs of engineering.
I've been at Cloudflare for over four years now.
I went from product strategy over to Usman's department in engineering.
It's been a been a really fun time. Right yeah and we won't talk much in turn or maybe we will in this show about like how demanding of a customer marketing is.
Well I think the talk about like why we have to dog food in the first place we actually do have to talk about the request not the demands of marketing.
Yeah so tell us a little bit about you know for many of our audience I don't know if the term you know marketing engineering and all that stuff will be familiar but tell us a little bit about the marketing website platform and I know that you guys have been on a journey on you know what it looked like and you know the things that we're doing now using a lot of internal technology.
So maybe you can you can tell the people that may be watching about you know a little bit about the platform.
Yeah I have a few slides.
All right perfect. Let's see let me you almost sound like a marketing person John.
I have a few slides. Can you see this or am I not? Absolutely.
Cool so let's talk about what marketing asks of us first.
So Cloudflare.com is this ever -evolving thing.
The needs and the wants and the requirements they change from year to year.
Today you know marketing asks a lot of us. They're wanting instant updates, fast turnaround on new pages, they need a globalized experience internationalization all of our pages, personalization so you know if a customer comes in they've already looked at Argo Tunnel before then you know maybe we should tell them something different on the home page.
They need analytics and most recently they need easier content management.
So you know how do we balance all those needs those requirements really with the fact that Cloudflare.com gets a lot of traffic.
This is over 30 days but I mean you can just sort of see that the volume is just massive.
So you know for me personally I've got different needs for Cloudflare.com.
I'm always looking to make a better and faster and lighter more maintainable more secure beautiful and just all around more excellent website.
But you know looking back in time at what Cloudflare.com was you know they're I know you're looking at this and thinking you know this is the pinnacle of of 2010 web design.
That's right. But you can previously I did a talk where I spun up our old code base and I got a lot of these old templates up and some of the old php code running and I just perused through what the design was like what the requirements were at the time and it was an extremely fascinating experience to be able to look at this old code and to try to try to just put myself in the shoes of the developer and of the marketing department at the time which I guess was just Michelle.
And just really think about the requirements. So the requirements have been changing this whole time to get to what I showed earlier and the code base had to evolve with it and this is a this is 10 years old at this point right and eventually we landed on this architecture just by the needs right.
I'm not going to go into all this but you can kind of tell it is somewhat complicated.
It has a lot of layers.
It has a lot of layers and you can you can tell right off the bat that you know we're going to have to use Cloudflare's load balancing product to load balance across these two Kubernetes clusters.
But you know when you're looking at this it's hard to get an understanding of all the all the parts and what I'm moving to today we are partially at least there right now is an architecture diagram that looks like this.
No real moving parts it's just a hundred percent cached on the edge.
And we get that with a static site architecture. I'll briefly go into like the technologies we're using.
Cloudflare Workers is a huge one.
We build a static site from our CMS data and then we put those files on the edge with workers KV.
And Wrangler and the workers team have this great feature called worker sites that makes it super duper easy to host a static site.
Preliminary performance improvements for Promising.
This is you know basically after our first POC we went from what Pingdom considered a grade of a D.
64 of 100 to 85 of 100 without much effort.
There's still a lot we can do to make this even better.
But I mean just looking at the experience side by side it's a little bit tough but you'll see on the left side is the legacy architecture and as you're clicking around the links it just takes time to load.
On the right hand side you click a link and it basically instantaneously navigates to the next section.
So many of the things like on the front you know from a templates perspective all those things may be very similar to what we have but you know a lot of they're basically plumbing underneath the engine and the wheels and all those things are totally different right.
Correct yes so underneath everything is different.
The site itself is the same. We want to incrementally move things over to a new architecture so we don't get stuck in a giant feature branch or something.
And we can make that happen with Cloudflare Workers.
So we have a root level worker that tries to look up a file on KV and if it's not there it just falls back to our old origin.
So the Kubernetes clusters would be the origin here.
And the productivity gains from the developer standpoint are unquestionable.
It's much easier to get a hold and an understanding of what's happening in the new system.
But most importantly it utilizes Cloudflare Workers in a way that is globally distributed.
Our previous architecture actually did use Cloudflare workers but not in a globally distributed way.
It would only hit one or two colos. Got it.
So yeah you can see the changing requirements over time of Cloudflare.com and every time we've had to adapt the system and meet the new requirements.
And here we are today.
This is actually a screenshot of our new homepage that is 100 % served by workers.
It's not live yet but you know one day. Since we're talking about dogfooding I wanted to give a shout out to my team who had noticed a small attack last night.
There was a small denial service launched against Cloudflare .com.
It managed to evade bot detection which generally does not happen. But using firewall analytics we were able to quickly put in a challenge rule which effectively stopped the attack.
So yeah that's basically all I have to show slide wise. Let me go ahead and stop sharing.
Thank you. So that was very helpful. And you know just to see you know a little bit of like in a few minutes the whole story basically from the website you know we need to start.
I don't remember you know many people know that I was a Cloudflare customer since around 2013 or 2013.
I don't remember the website with the ninjas so that was probably for me.
So every time that I see that you know I kind of like get a chuckle about it because I think it's fantastic.
It's fabulous. And tell us also a little bit you know in previous lives when I have managed things like this as marketing departments basically and the content teams on those marketing departments expand and you know depending on how their structure you know you may have like international teams or different teams that you know maybe we want to serve different localized content you know depending on the country or products that they may resonate.
How has that influenced you know some of the maybe architectural choices to help them you know maybe better self-serve or manage that content in a more agile fashion?
Well you know from a tech standpoint I want to make it as simple as possible and since we manage all of our content in the content management system we better choose a content management system that has localization support and we happen to use Contentful.
There's a pretty okay integration with our translation management service.
Unfortunately there's not a whole lot of dogfooding going on there because it's internationalization and yeah that's hopefully a solved problem elsewhere but from the tools that we built to actually generate our pages you know we may have situations where you want to redirect based on geo IP.
I personally dislike that but our localization team asks for it.
Cloudflare makes it super easy to do that because you've got what CDN CGI trace we know whereabouts in the world the request is coming from and we can redirect based on their supposed geo IP but again I want to reiterate that's not the best practice.
What is better is if you provide some hints say to the UI and we're actually working on that to where if we notice the request is coming from let's say Germany right yeah anywhere but then and then they're looking at the English site we can provide a modal that says hey we know you're you know potentially viewing this from Germany but you're looking at English page would you like to change your language.
That's the sort of thing that we can do because we're just using Cloudflare it's super duper easy.
And do you think that you know you were saying that the marketing team likes to do that even if necessarily you won't like it.
Do you think that this is because they don't yet understand some of the capabilities of the platform or it is more of a kind of like looking at it from a little bit of a different lens as a marketeer and certain things and you know more from an engineering point of view perspective.
Yeah I think it's just looking through a different lens and you know I think it was really common to redirect based on GOIP in the past but time and time again it's been shown that there's there's a lot of false positives there and it results in a not great user experience.
We actually deployed a bug recently that was around redirecting on an IP address and we removed the offending code and the redirection altogether just because it can be kind of a hairy thing you're like okay well did they select a language okay where's the IP coming from did we redirect previously what's the referrer.
There's a lot of logic there that's completely unnecessary let the user control.
Yeah I remember in my previous company we had we almost created a system ourselves also to me and I told that redirect logic and it can get like really really complex and it's one of those things that you know it's almost like I just start building on it it gets even more difficult to ever get rid of it right because like you know so much business logic in many cases built into there and kind of like unpick all that it's super complicated so I know exactly what you're talking about.
So in coming back to workers and some of those things that you guys are using in there is there any I know you're using workers could be a worker site but is there for instance any other capabilities that have been launched recently Rita was a couple of episodes ago you know here showing us certain things around workers unbound and some of those things that are interesting for you guys to incorporate or you may have been beta testing for them as part of their build.
I can't say anything recently in workers but something I recently discovered was wrangler tail.
I was having some issues in production and I would have loved to have tailed my site Cloudflare.com rather but turns out we have too much traffic for wrangler tail to work but our staging environment does not have too much traffic so I was able to use wrangler tail and chase down some logs for some problematic requests but other than that I'm having a hard time thinking of anything recent other than KV that we've been using.
Wrangler tail is almost like you can basically start take a look like in real time basically all the log tail as it comes yeah so in production that must be a little crazy obviously.
In production definitely too crazy there's a limit put in place that prevents you from using it but staging I was able to you know just run a bunch of requests and see it see the tail output immediately so it was it was quite useful.
A WAF is a security system that uses a set of rules to filter and monitor HTTP traffic between web applications and the Internet.
Just as a tollbooth allows paying customers to drive across a toll road and prevents non-paying customers from accessing the roadway, network traffic must pass through a firewall before it is allowed to reach the server.
WAFs use adaptable policies to defend vulnerabilities in a web application, allowing for easy policy modification and faster responses to new attack vectors.
By quickly adjusting their policies to address new threats, WAFs protect against cyber attacks like cross -site forgery, file inclusion, cross-site scripting, and SQL injection.