Cloudflare at Cloudflare
Presented by: Juan Rodriguez, John Fawcett
Originally aired on August 27, 2020 @ 1:30 PM - 2:00 PM EDT
Come learn how we use Cloudflare technologies internally to solve problems (or as we say "dogfood our own products" internally).
English
Transcript (Beta)
Hello everyone. After a couple of weeks hiatus we're back on Cloudflare at Cloudflare.
As you know this is a program where we talk about dogfooding and I'm your host Juan Rodriguez.
I'm the CIO of Cloudflare. For some of you that may not have watched the show before dogfooding is a term that we use in Cloudflare to refer that we try to eat our own dogfood.
We drink our own champagne sometimes how people say.
So we use Cloudflare technology to solve Cloudflare problems or things that we may have to do with our products or services or IP or things like that.
So I also bring a guest or two. So today I have Mr. John Fawcett with me. Hey John how are you?
I'm great how are you Juan? I'm all right not bad for a Thursday.
So I'll let you introduce yourself. Why don't you tell us maybe like you know who you are what you do how long you've been at Cloudflare.
Sure yeah so I'm John.
I'm the engineering manager for the marketing engineering team. So basically we control Cloudflare.com www .Cloudflare.com.
We are actually one of Cloudflare's biggest internal customers but speaking of internal customers marketing is our biggest internal customer.
So we're always just you know weighing the needs of marketing with the needs of engineering.
I've been at Cloudflare for over four years now.
I went from product strategy over to Usman's department in engineering.
It's been a been a really fun time. Right yeah and we won't talk much in turn or maybe we will in this show about like how demanding of a customer marketing is.
Well I think the talk about like why we have to dog food in the first place we actually do have to talk about the request not the demands of marketing.
Yeah so tell us a little bit about you know for many of our audience I don't know if the term you know marketing engineering and all that stuff will be familiar but tell us a little bit about the marketing website platform and I know that you guys have been on a journey on you know what it looked like and you know the things that we're doing now using a lot of internal technology.
So maybe you can you can tell the people that may be watching about you know a little bit about the platform.
Yeah I have a few slides.
All right perfect. Let's see let me you almost sound like a marketing person John.
I have a few slides. Can you see this or am I not? Absolutely.
Cool so let's talk about what marketing asks of us first.
So Cloudflare.com is this ever -evolving thing.
The needs and the wants and the requirements they change from year to year.
Today you know marketing asks a lot of us. They're wanting instant updates, fast turnaround on new pages, they need a globalized experience internationalization all of our pages, personalization so you know if a customer comes in they've already looked at Argo Tunnel before then you know maybe we should tell them something different on the home page.
They need analytics and most recently they need easier content management.
So you know how do we balance all those needs those requirements really with the fact that Cloudflare.com gets a lot of traffic.
This is over 30 days but I mean you can just sort of see that the volume is just massive.
So you know for me personally I've got different needs for Cloudflare.com.
I'm always looking to make a better and faster and lighter more maintainable more secure beautiful and just all around more excellent website.
More better.
But you know looking back in time at what Cloudflare.com was you know they're I know you're looking at this and thinking you know this is the pinnacle of of 2010 web design.
That's right. But you can previously I did a talk where I spun up our old code base and I got a lot of these old templates up and some of the old php code running and I just perused through what the design was like what the requirements were at the time and it was an extremely fascinating experience to be able to look at this old code and to try to try to just put myself in the shoes of the developer and of the marketing department at the time which I guess was just Michelle.
And just really think about the requirements. So the requirements have been changing this whole time to get to what I showed earlier and the code base had to evolve with it and this is a this is 10 years old at this point right and eventually we landed on this architecture just by the needs right.
I'm not going to go into all this but you can kind of tell it is somewhat complicated.
It has a lot of layers.
It has a lot of layers and you can you can tell right off the bat that you know we're going to have to use Cloudflare's load balancing product to load balance across these two Kubernetes clusters.
But you know when you're looking at this it's hard to get an understanding of all the all the parts and what I'm moving to today we are partially at least there right now is an architecture diagram that looks like this.
No real moving parts it's just a hundred percent cached on the edge.
And we get that with a static site architecture. I'll briefly go into like the technologies we're using.
Cloudflare Workers is a huge one.
We build a static site from our CMS data and then we put those files on the edge with workers KV.
And Wrangler and the workers team have this great feature called worker sites that makes it super duper easy to host a static site.
Preliminary performance improvements for Promising.
This is you know basically after our first POC we went from what Pingdom considered a grade of a D.
64 of 100 to 85 of 100 without much effort.
There's still a lot we can do to make this even better.
But I mean just looking at the experience side by side it's a little bit tough but you'll see on the left side is the legacy architecture and as you're clicking around the links it just takes time to load.
On the right hand side you click a link and it basically instantaneously navigates to the next section.
So many of the things like on the front you know from a templates perspective all those things may be very similar to what we have but you know a lot of they're basically plumbing underneath the engine and the wheels and all those things are totally different right.
Correct yes so underneath everything is different.
The site itself is the same. We want to incrementally move things over to a new architecture so we don't get stuck in a giant feature branch or something.
And we can make that happen with Cloudflare Workers.
So we have a root level worker that tries to look up a file on KV and if it's not there it just falls back to our old origin.
So the Kubernetes clusters would be the origin here.
And the productivity gains from the developer standpoint are unquestionable.
It's much easier to get a hold and an understanding of what's happening in the new system.
But most importantly it utilizes Cloudflare Workers in a way that is globally distributed.
Our previous architecture actually did use Cloudflare workers but not in a globally distributed way.
It would only hit one or two colos. Got it.
So yeah you can see the changing requirements over time of Cloudflare.com and every time we've had to adapt the system and meet the new requirements.
And here we are today.
This is actually a screenshot of our new homepage that is 100 % served by workers.
It's not live yet but you know one day. Since we're talking about dogfooding I wanted to give a shout out to my team who had noticed a small attack last night.
There was a small denial service launched against Cloudflare .com.
It managed to evade bot detection which generally does not happen. But using firewall analytics we were able to quickly put in a challenge rule which effectively stopped the attack.
So yeah that's basically all I have to show slide wise. Let me go ahead and stop sharing.
Thank you. So that was very helpful. And you know just to see you know a little bit of like in a few minutes the whole story basically from the website you know we need to start.
I don't remember you know many people know that I was a Cloudflare customer since around 2013 or 2013.
I don't remember the website with the ninjas so that was probably for me.
So every time that I see that you know I kind of like get a chuckle about it because I think it's fantastic.
It's fabulous. And tell us also a little bit you know in previous lives when I have managed things like this as marketing departments basically and the content teams on those marketing departments expand and you know depending on how their structure you know you may have like international teams or different teams that you know maybe we want to serve different localized content you know depending on the country or products that they may resonate.
How has that influenced you know some of the maybe architectural choices to help them you know maybe better self-serve or manage that content in a more agile fashion?
Well you know from a tech standpoint I want to make it as simple as possible and since we manage all of our content in the content management system we better choose a content management system that has localization support and we happen to use Contentful.
There's a pretty okay integration with our translation management service.
Unfortunately there's not a whole lot of dogfooding going on there because it's internationalization and yeah that's hopefully a solved problem elsewhere but from the tools that we built to actually generate our pages you know we may have situations where you want to redirect based on geo IP.
I personally dislike that but our localization team asks for it.
Cloudflare makes it super easy to do that because you've got what CDN CGI trace we know whereabouts in the world the request is coming from and we can redirect based on their supposed geo IP but again I want to reiterate that's not the best practice.
What is better is if you provide some hints say to the UI and we're actually working on that to where if we notice the request is coming from let's say Germany right yeah anywhere but then and then they're looking at the English site we can provide a modal that says hey we know you're you know potentially viewing this from Germany but you're looking at English page would you like to change your language.
That's the sort of thing that we can do because we're just using Cloudflare it's super duper easy.
And do you think that you know you were saying that the marketing team likes to do that even if necessarily you won't like it.
Do you think that this is because they don't yet understand some of the capabilities of the platform or it is more of a kind of like looking at it from a little bit of a different lens as a marketeer and certain things and you know more from an engineering point of view perspective.
Yeah I think it's just looking through a different lens and you know I think it was really common to redirect based on GOIP in the past but time and time again it's been shown that there's there's a lot of false positives there and it results in a not great user experience.
We actually deployed a bug recently that was around redirecting on an IP address and we removed the offending code and the redirection altogether just because it can be kind of a hairy thing you're like okay well did they select a language okay where's the IP coming from did we redirect previously what's the referrer.
There's a lot of logic there that's completely unnecessary let the user control.
Yeah I remember in my previous company we had we almost created a system ourselves also to me and I told that redirect logic and it can get like really really complex and it's one of those things that you know it's almost like I just start building on it it gets even more difficult to ever get rid of it right because like you know so much business logic in many cases built into there and kind of like unpick all that it's super complicated so I know exactly what you're talking about.
So in coming back to workers and some of those things that you guys are using in there is there any I know you're using workers could be a worker site but is there for instance any other capabilities that have been launched recently Rita was a couple of episodes ago you know here showing us certain things around workers unbound and some of those things that are interesting for you guys to incorporate or you may have been beta testing for them as part of their build.
I can't say anything recently in workers but something I recently discovered was wrangler tail.
I was having some issues in production and I would have loved to have tailed my site Cloudflare.com rather but turns out we have too much traffic for wrangler tail to work but our staging environment does not have too much traffic so I was able to use wrangler tail and chase down some logs for some problematic requests but other than that I'm having a hard time thinking of anything recent other than KV that we've been using.
Wrangler tail is almost like you can basically start take a look like in real time basically all the log tail as it comes yeah so in production that must be a little crazy obviously.
In production definitely too crazy there's a limit put in place that prevents you from using it but staging I was able to you know just run a bunch of requests and see it see the tail output immediately so it was it was quite useful.
Yeah cool and from some of the other capabilities in workers anything that you know if somebody was basically thinking about building a model you know similar to what we have you know because you would think like well you know if clover is building all the site like as a static site they're using workers extensively and things like that and you were speaking with somebody that was thinking about about you know building something similar in architecture similar to this is there any particular features that you will highlight that you have found like you know useful for your team is there any particular if you want to call it gotchas that said like well I wish that we have known about that before because that was quite a detour or a rabbit hole that we went into yeah well oh yeah for you that question made me realize a newer workers feature that we do employ that with great success I will get to that in a second but some gotchas is start with typescript just just don't don't worry about writing javascript always always start with typescript you will hate yourself the first runtime error you deployed a production that could have been caught by by the type checker but the html rewriter is extremely useful for the sort of incremental migration that we're doing so we've got our origin server and we've got workers kb static sites and if we if we need to apply some logic to both responses to transform some response in a way the html rewriter is immensely valuable for that you know traditionally you this would be something that we would have handled in our origin server but now that we've got you know two versions of the site having this common entry point with Cloudflare Workers and using html rewriter we're able to accomplish something that would have been in two places now it's in one but the feature that we we implemented with html rewriter is sort of interesting and probably common for marketing sites actually so you may have heard of gdpr and consent management so there are some places where there are some places where you have to get consent before you use cookies and you have to know about that consent even depending on the type of cookie yeah exactly so you know we we have been using uh google optimized for a while to run ab tests and oftentimes an ab test will redirect you based on your cohort to a different page the when we implemented consent management uh that decision to load google optimized was then gated by uh first checking a a cookie uh that can that can result in a lot of latency between loading the google optimized script checking your cohort fetching the uh the actual experiment code and so on so um i moved consent management to the edge or we moved consent management to the edge or at least the parsing of it so if a user has already consented uh that there's a cookie available at that time so um then we know that google optimized can load immediately so we we transform any response uh using html rewriter if the user has consented to having performance cookies uh so that google optimized script gets immediately prepended to the top of the head uh so that removed any sort of experiment flicker at that point if you had already consented or if you're in an area that is not as strict that's cool yeah my my personal belief is we should apply that across the board rather than uh than just uh eu countries but uh um why why why why is there a particular reason why we don't do that yes um so well it is a uh the same problem as before a different perspective uh looking through the lens differently um i i tend to think that we should um we should adhere to the sort of strictest privacy laws across the board regardless of you know if they're less strict and non-capitalist states um but uh that that opinion is not uh widely held it requires socializing right yes and and i'm fighting for that but uh it's a it's an uphill battle yeah i think that you know with with our you know costler i mean we our brand promise you know has to do with uh with uh privacy first and everything and we're like a great advocate for all those things so uh you know maybe you and i need to chat about like you know how i can also help you uh fight that battle right because i agree as a consumer sometimes i think that you know these sort of things uh uh um you know become annoying and i think that uh it is appreciated it is appreciated you know appreciated you know by the way you know when you get a site or a brand inside other basically you know that they're going above and beyond even what it is required uh you know in some of these in some of these areas so all right we'll talk about that offline yeah so beyond workers though i mean uh Cloudflare.com or marketing engineering we it's probably more useful to talk about the the products that we don't use uh than the products that we do uh but i mean Cloudflare access and we get immense uh use out of because we have to qa a lot of stuff uh and having an easy to use system to only allow Cloudflare employees to look at preview links right then workers again right so every time a branch is submitted to our primary repo we have a build process that creates a new worker and attaches it at a particular route which is protected by access we talked about load balancing before i mean that that i mean literally makes our site available and faster in in uh the eu and that side of the world and in the united states because we've got two kubernetes clusters yeah uh WAF and firewall and bot management i mean these are indispensable at this point um but yeah using basically pretty much the whole stack almost in a in a way i mean there's probably very few things in there that you're not using correct yeah that's very cool that's that's that's that's pretty awesome um well we have another last uh five minutes uh uh is there uh apart from uh um the type script or or anything like that any other uh uh last minute uh call outs or or or anything that you're gonna want to tell the audience as i said uh you know if they're considering uh uh building uh something like this or why they should or what they should and uh and all that kind of stuff um well if you're building i i if you're there's nothing there's nothing really special about our marketing side that it wouldn't be in others right i mean there's nothing specific on this is why we must use the cloud for their architecture right i mean it's it's a marketing side just like just like another right yeah well all i can say is that uh if you're if you're like us and you're you're migrating from a service to uh essentially originless uh it's it's great it's amazing because you get to this point where you you aren't maintaining services anymore uh you don't have to have the expertise of of um maintaining a service standing it up and keeping it online that's logging and monitoring and alerting all those things essentially get offloaded to to Cloudflare and it's been really really great um like i said before start off with typescript do yourself a favor there uh and um yeah that's that's that's really about it and you can achieve this sort of incremental um incremental migration yeah uh really simply by by scripting the edge so highly recommend it yeah i think that that's like you know i didn't realize that uh that's you know one of the almost magic tricks that you guys were using about you know i i see many many projects around cms migration so replatform so basically websites and things like that they're almost like a big bang approach and um you know with all the uh good and bad that comes with that in many cases more bad than good so i think that you know this trick also that you know maintaining uh the the um you know a lot of the visual identity of the site but you know redirecting depending on what the page is and things like that you know to the new architecture of the architecture if you're not changing the information architecture of it i think that it is uh that it is uh um you know a fantastic basic way to do it so i'm happy that uh you know we can share that with the audience all right so we're gonna wrap up uh with three minutes to go thank you john and i know that this was your first uh Cloudflare tv so uh hopefully it wasn't too painful it was amazing thank you juan you're an amazing host i i really appreciate it uh thank you and uh um we'll uh we'll be back uh next week with somebody else talk about like some footage if anybody has any any questions now or anything in the in the uh you know later just send us a message and we'll be more than happy to get back to you thank you everyone thank you john have a great rest of the week bye see you so What is a WAF?
A WAF is a security system that uses a set of rules to filter and monitor HTTP traffic between web applications and the Internet.
Just as a tollbooth allows paying customers to drive across a toll road and prevents non-paying customers from accessing the roadway, network traffic must pass through a firewall before it is allowed to reach the server.
WAFs use adaptable policies to defend vulnerabilities in a web application, allowing for easy policy modification and faster responses to new attack vectors.
By quickly adjusting their policies to address new threats, WAFs protect against cyber attacks like cross -site forgery, file inclusion, cross-site scripting, and SQL injection.