Cloudflare at Cloudflare
Presented by: Juan Rodriguez, John Fawcett
Originally aired on March 9, 2021 @ 10:00 AM - 10:30 AM CST
Come learn how we use Cloudflare technologies internally to solve problems (or as we say "dogfood our own products" internally).
English
Transcript (Beta)
Hello everyone. After a couple of weeks hiatus we're back on Cloudflare at Cloudflare.
As you know this is a program where we talk about dogfooding and I'm your host Juan Rodriguez.
I'm the CIO of Cloudflare. For some of you that may not have watched the show before dogfooding is a term that we use in Cloudflare to refer that we try to eat our own dogfood.
We drink our own champagne sometimes how people say.
So we use Cloudflare technology to solve Cloudflare problems or things that we may have to do with our products or services or IP or things like that.
So I also bring a guest or two. So today I have Mr. John Fawcett with me. Hey John how are you?
I'm great how are you Juan? I'm all right not bad for a Thursday.
So I'll let you introduce yourself. Why don't you tell us maybe like you know who you are what you do how long you've been at Cloudflare.
Sure yeah so I'm John.
I'm the engineering manager for the marketing engineering team. So basically we control Cloudflare.com www .Cloudflare.com.
We are actually one of Cloudflare's biggest internal customers but speaking of internal customers marketing is our biggest internal customer.
So we're always just you know weighing the needs of marketing with the needs of engineering.
I've been at Cloudflare for over four years now.
I went from product strategy over to Usman's department in engineering.
It's been a been a really fun time. Right yeah and we won't talk much in turn or maybe we will in this show about like how demanding of a customer marketing is.
Well I think the talk about like why we have to dog food in the first place we actually do have to talk about the request not the demands of marketing.
Yeah so tell us a little bit about you know for many of our audience I don't know if the term you know marketing engineering and all that stuff will be familiar but tell us a little bit about the marketing website platform and I know that you guys have been on a journey on you know what it looked like and you know the things that we're doing now using a lot of internal technology.
So maybe you can you can tell the people that may be watching about you know a little bit about the platform and yeah I have a few slides.
All right perfect. Let's see let me you almost sound like a marketing person John.
I have a few slides. Can you see this or am I not? Absolutely.
Cool. So let's talk about what marketing asks of us first.
So clubflare.com is this ever -evolving thing.
The needs and the wants and the requirements they change from year to year.
Today you know marketing asks a lot of us. They're wanting instant updates, fast turnaround on new pages, they need a globalized experience internationalization all of our pages, personalization so you know if a customer comes in they've already looked at Argo Tunnel before then you know maybe we should tell them something different on the home page.
They need analytics and most recently they need easier content management.
So you know how do we balance all those needs those requirements really with the fact that clubflare.com gets a lot of traffic.
This is over 30 days but I mean you can just sort of see that the volume is just massive.
So you know for me personally I've got different needs for clubflare.com.
I'm always looking to make a better and faster and lighter, more maintainable, more secure, beautiful and just all-around more excellent website.
More better.
More better. But you know looking back in time at what clubflare.com was you know they're I know you're looking at this and thinking you is the pinnacle of 2010 web design.
That's right. But you can previously I did a talk where I spun up our old code base and I got a lot of these old templates up and some of the old PHP code running and I just perused through what the design was like.
What the requirements were at the time and it was an extremely fascinating experience to be able to look at this old code and to try to try to just put myself in the shoes of the developer and of the marketing department at the time which I guess was just Michelle.
And just really think about the requirements. So the requirements have been changing this whole time to get to what I showed earlier and the code base had to evolve with it.
And this is 10 years old at this point right and eventually we landed on this architecture just by the needs right.
I'm not going to go into all this but you can kind of tell it is somewhat complicated.
It has a lot of layers.
It has a lot of layers and you can tell right off the bat that you know we're going to have to use Cloudflare's load balancing product to load balance across these two Kubernetes clusters.
But you know when you're looking at this it's hard to get an understanding of all the parts and what I'm moving to today, we are partially at least there right now, is an architecture diagram that looks like this.
No real moving parts.
It's just 100% cached on the edge and we get that with a static site architecture.
I'll briefly go into like the technologies we're using.
Cloudflare Workers is a huge one.
We build a static site from our CMS data and then we put those files on the edge with workers KV.
And Wrangler and the workers team have this great feature called worker sites that makes it super duper easy to host a static site.
Preliminary performance improvements for Promising.
This is basically after our first POC we went from what Pingdom considered a grade of a D, 64 of 100 to 85 of 100 without much effort.
There's still a lot we can do to make this even better. But I mean just looking at the experience side by side it's a little bit tough but you'll see on the left side is the legacy architecture and as you're clicking around the links it just takes time to load.
On the right hand side you click a link and it basically instantaneously navigates to the next section.
So many of the things like on the front you know from a templates perspective all those things may be very similar to what we have but you know a lot of the basically plumbing underneath the engine and the wheels and all those things are totally different right?
Correct yes so underneath everything is different. The site itself is the same.
We want to incrementally move things over to a new architecture so we don't get stuck in a giant feature branch or something.
And we can make that happen with Cloudflare Workers.
So we have a root level worker that tries to look up a file on KV and if it's not there it just falls back to our old origin.
So the Kubernetes clusters would be the origin here.
And the productivity gains from the developer standpoint are unquestionable.
It's much easier to get a hold and an understanding of what's happening in the new system.
But most importantly it utilizes Cloudflare Workers in a way that is globally distributed.
Our previous architecture actually did use Cloudflare workers but not in a globally distributed way.
It would only hit one or two colos. Got it.
So yeah you can see the changing requirements over time of Cloudflare.com and every time we've had to adapt the system and meet the new requirements.
And here we are today.
This is actually a screenshot of our new homepage that is 100 % served by workers.
It's not live yet but you know one day. Since we're talking about dogfooding I wanted to give a shout out to my team who had noticed a small attack last night.
There was a small denial service launched against Cloudflare .com.
It managed to evade bot detection which generally does not happen. But using firewall analytics we were able to quickly put in a challenge rule which effectively stopped the attack.
So yeah that's basically all I have to show slide wise. Let me go ahead and stop sharing.
Thank you. So that was very helpful. And you know just to see you know a little bit of like in a few minutes the whole story basically from the website you know we need to start.
I don't remember you know many people know that I was a Cloudflare customer since around 2013 or 2013.
I don't remember the website with the ninjas so that was probably for me.
So every time that I see that you know I kind of like get a chuckle about it because I think it's fantastic.
It's fabulous. And tell us also a little bit you know in previous lives when I have managed things like this as marketing departments basically and the content teams on those marketing departments expand and you know depending on how their structure you know you may have like international teams or different teams that you know maybe we want to serve different localized content you know depending on the country or products that they may resonate.
How has that influenced you know some of the maybe architectural choices to help them you know maybe better self-serve or manage that content in a more agile fashion?
Well you know from a tech standpoint I want to make it as simple as possible and since we manage all of our content in the content management system we better choose a content management system that has localization support and we happen to use Contentful.
There's a pretty okay integration with our translation management service.
Unfortunately there's not a whole lot of dogfooding going on there because it's internationalization and yeah that's hopefully a solved problem elsewhere but from the tools that we built to actually generate our pages you know we may have situations where you want to redirect based on geo IP.
I personally dislike that but our localization team asks for it.
Cloudflare makes it super easy to do that because you've got what CDN CGI trace.
We know whereabouts in the world the request is coming from and we can we can redirect based on their supposed geo IP but again I want to reiterate that's not the best practice.
What is better is if you provide some hints say to the UI and we're actually working on that to where if we notice the request is coming from let's say Germany right yeah anywhere but then and they're looking at the English site we can provide a modal that says hey we know you're you know potentially viewing this from Germany but you're looking at English page would you like to change your language.
That's the sort of thing that we can do because we're just using Cloudflare.
It's super duper easy. Mm hmm. And do you think that you know you were saying that the marketing team likes to do that even if necessarily you won't like it.
Do you think that this is because they don't yet understand some of the capabilities of the platform or it is more of a kind of like looking at it from a little bit of a different lens as a marketeer and certain things and you know more from an engineering point of view perspective.
Yeah I think it's just looking through a different lens and you know I think it was really common to redirect based on geo IP but time and time again it's been shown that there's there's a lot of false positives there and it results in a not great user experience.
We actually deployed a bug recently that was around redirecting on IP address and we removed the offending code and the redirection altogether just because it can be kind of a hairy thing or like okay well did they select a language.
Okay. Where's the IP coming from. Did we redirect previously.
What's the refer. There is a lot of logic there that's completely unnecessary.
Let the user control. Yeah I remember in my previous company we had we almost created a system ourselves also to me and I told that redirect logic and it can get like really really complex and it's one of those things that you know it's almost like I just start building on it.
It gets even more difficult to ever get rid of it.
Right. Because like you know so much business logic in many cases built into there and kind of like and pick all that.
It's it's it's it's super complicated.
So I know exactly what what you are what you're talking about. So in coming back to workers and some of those things that you guys are are using in there.
Is there any I know you're using workers could be a worker site. But is there for instance any other capabilities that have been launched recently.
Rita was a couple of episodes ago you know here showing us certain things around workers unbound and some of those things that are interesting for you guys to incorporate or you may have been beta testing for them as part of their bill.
I can't say anything recently in workers.
But something I recently discovered was Wrangler tail. I was having some issues in production and I would have loved to have tailed my site Cloudflare dot com rather.
But turns out we have too much traffic for Wrangler tail to work.
But our staging environment does not have too much traffic. So I was able to use Wrangler tail and chase down some logs for some problematic requests.
But other than that I'm having a hard time thinking of anything anything recent other than KV that we've been.
Yeah. Yeah. And Wrangler tail just understand it's almost like you can basically start take a look like in real time basically all day all day log tail as it comes.
Yeah. So in production that must be a little crazy. Production is definitely too crazy.
There's a there's a limit put in place that prevents you from using it.
But staging I was able to you know just run a bunch of requests and see it see the tail output immediately.
So it was it was quite useful.
Yeah. Cool. And from from some of the other capabilities in in in in workers anything that you know if somebody was basically thinking about building a model you know similar to what we have you know because you would think well you know if Clover is building all of the site like as a static site they're using workers ostensibly and things like that and you you were speaking with somebody that was thinking about about you know building something similar in architecture similar to this.
Is there any particular features that you will highlight that you have found like you know useful for your team.
Is there any particular if you want to call it gotcha that said like well I wish that we have known about that before because that was quite a detour or rabbit hole that we went into.
Well oh yeah a few. That question made me realize a newer workers feature that we do employ that with great success.
I will get to that in a second but some gotchas is start with TypeScript.
Just just don't don't worry about writing JavaScript.
Always always start with TypeScript.
You will hate yourself the first runtime error you deployed a production that could have been caught by by the type checker.
But the HTML rewriter is extremely useful for the sort of incremental migration that we're doing.
So we've got our origin server and we've got workers KB static sites.
And if we if we need to apply some logic to both responses transform some response in a way the HTML rewriter is immensely valuable for that.
You know traditionally you this would be something that we would have handled in our origin server.
But now that we've got you know two versions of the site having this common entry point with Cloudflare workers and using HTML rewriter we're able to accomplish something that would have been in two places.
Now it's in one. But the feature that we we implemented with HTML rewriter is sort of interesting and probably common for marketing sites actually.
So you may have heard of GDPR and consent management. So there are some places there are some places where you have to get consent before you use cookies and you have to know about that consent even depending on the type of cookie.
Yeah exactly.
So you know we we have been using Google Optimize for a while to run A B tests and oftentimes an A B test will redirect you based on your cohort to a different page.
When we implemented consent management that decision to load Google Optimize was then gated by first checking a cookie that can that can result in a lot of latency between loading the Google Optimize script checking your cohort fetching the the actual experiment code and so on.
So I moved consent management to the edge or we moved consent management to the edge or at least the parsing of it.
So if a user has already consented that there's a cookie available at that time.
So then we know that Google Optimize can load immediately. So we transform any response using HTML rewriter if the user has consented to having performance cookies so that Google Optimize script gets immediately prepended to the top of the head.
So that removes any sort of experiment flicker at that point if you had already consented or if you're in an area that is not a script.
That's cool.
Yeah my my personal belief is we should apply that across the board rather than than just EU countries.
But why is there a particular reason why we don't do that?
Yes so well it is a the same problem as before a different perspective looking through the lens differently.
I tend to think that we should we should adhere to the sort of strictest privacy laws across the board regardless of you know if they're less strict and non-copyrighting estates.
But that opinion is not widely held.
It requires socializing right. Yes and I'm fighting for that but it's a it's an uphill battle.
Yeah I think that you know with our you know Cochlear I mean we our brand promise you know has to do with with privacy first and everything and we're like a great advocate for all those things.
So you know maybe you and I need to chat about like you know how I can also help you fight that battle right.
Because I agree as a consumer sometimes I think that you know these sort of things you know become annoying and I think that it is appreciated.
It is appreciated you know appreciated you know by the way you know when you get a site or a brand inside of basically you know that they're going above and beyond even what it is required you know in some of these in some of these areas.
So all right we'll talk about that offline.
Yeah so beyond workers though I mean Cloudflare.com or marketing engineering we it's probably more useful to talk about the the products that we don't use than the products that we do.
But I mean Cloudflare access and we get immense use out of because we have to QA a lot of stuff and having an easy to use system to only allow Cloudflare employees to look at preview links.
Right. Then workers again right so every time a branch is submitted to our primary repo we have a build process that creates a new worker and attaches it at a particular route which is protected by access.
We talked about load balancing before I mean that that I mean literally makes our site available and faster in the EU and that side of the world and in the United States because we've got two Kubernetes clusters.
Yeah.
WAF and firewall and bot management I mean these are indispensable at this point.
But yeah. So you're also using basically pretty much the whole stack almost in a way.
I mean there's probably very few things in there that you're not using.
Correct. Yeah that's very cool. That's pretty awesome. Well we have another last five minutes.
Is there apart from the TypeScript or anything like that any other last minute call outs or anything that you want to tell the audience.
As I said you know if they're considering building something like this or why they should or why they shouldn't and all that kind of stuff.
Well if you're building.
There's nothing really special about our marketing side that it wouldn't be in others right.
I mean there's nothing specific on this is why we must use the cloud for their architecture right.
I mean it's a marketing side just like another right.
Yeah. Well all I can say is that if you're like us and you're migrating from a service to essentially originless it's great.
It's amazing because you get to this point where you aren't maintaining services anymore.
You don't have to have the expertise of maintaining a service standing it up and keeping it online.
That's logging and monitoring and alerting all those things essentially get offloaded to Cloudflare and it's been really really great.
Like I said before start off with TypeScript do yourself a favor there.
And yeah that's that's that's really about it. And you can achieve this sort of incremental incremental migration yeah really simply by by scripting the edge.
So highly recommend it. Yeah I think that that's like you know I didn't realize that that's you know one of almost magic tricks that you guys were using about you know I see many many projects around CMS migration.
So replatforms are basically websites and things like that. They're almost like a big bang approach.
And you know with all the good and bad that comes with that in many cases more bad than good.
So I think that you know this trick also that you know maintaining the the you know a lot of the visual identity of the site but you know redirecting depending on what the page is and things like that you know to the new architecture of the architecture.
If you're not changing the information architecture of it I think that it is that it is you know a fantastic way to do it.
So I'm happy that you know we can share that with the audience. All right so we're gonna wrap up with three minutes to go.
Thank you John and I know that this was your first Cloudflare TV so hopefully it wasn't too painful.
It was amazing thank you Juan.
You're an amazing host. I really appreciate it. Thank you and we'll be back next week with somebody else talk about like some of the footing if anybody has any any questions now or anything in the in the you know later just send us a message and we'll be more than happy to get back to you.
Thank you everyone.
Thank you John. Have a great rest of the week. Bye. See you. So you you What is a WAF?
A WAF is a security system that uses a set of rules to filter and monitor HTTP traffic between web applications and the Internet.
Just as a toll booth allows paying customers to drive across a toll road and prevents non -paying customers from accessing the roadway, network traffic must pass through a firewall before it is allowed to reach the server.
WAFs use adaptable policies to defend vulnerabilities in a web application, allowing for easy policy modification and faster responses to new attack vectors.
By quickly adjusting their policies to address new threats, WAFs protect against cyber attacks like cross -site forgery, file inclusion, cross-site scripting, and SQL injection.