Getting started with Transform Rules: A Beginners Guide
Presented by: Sam Marsh
Originally aired on August 26, 2021 @ 6:00 AM - 6:30 AM EDT
Transform Rules elevates traffic modification to the Cloudflare edge. This new product enables user's to modify HTTP requests without writing a single line of code. Come to the session to get started and create your first Transform Rule.
English
Transcript (Beta)
Hello everybody, good morning, good afternoon, welcome to another day of Cloudflare TV.
My name is Sam Marsh, I am the product manager of Transform Rules amongst many other things and today I will be taking you through how you can use Transform Rules, basically what it offers you and how it can help you in your environment.
Firstly, a kind of summary of you know what Transform Rules are. So this is a product that we launched two parts, mainly in April this year with URL rewrites and normalization and then with a subsequent addition of header modification or request header modification in June of this year as well, so both quite recent.
And in a nutshell what these products allow you to do is to modify traffic as they flow through your Cloudflare zone.
So URL rewrites allow you to rewrite the URL path and query of the HTTP requests that your zone receives and the second one allows you to basically modify the HTTP requests headers which flow through your zone as well, so removing or adding additional headers.
The key thing, and it's worth stressing this before we kind of go further, is in order for these rules to take effect and to apply the traffic going through your zone must be proxied.
So what you see on the screen here is effectively traffic comes into Cloudflare, it goes off to your origin whether it's another cloud provider or on-premise or a bit of both, but unless we are proxying that traffic which is that lovely orange clouded box you see on the DNS screen, then these rules just won't take effect and this is the same for firewall rules, transform rules, page rules, it has to be proxied or it won't really do anything.
So if you're seeing issues or you're seeing that it's not working then this is pretty much step one is make sure that box is ticked so the traffic comes into us and then comes out from those effectively.
The next part is going to be walking you through the first of these transform rules which is URL rewrites.
The key thing again before we dive much deeper into this is looking at what a rewrite versus a redirect is.
People talk about these terms interchangeably and they're really not.
A rewrite basically comes into Cloudflare, it comes into the reverse proxy and we effectively proxy that request to the web server.
This doesn't change what the user sees in the browser bar. A redirect on the other hand basically comes into Cloudflare and we effectively tell the browser to go to another URL or another location and this does change what the eyeball, what the client sees in the browser and we're talking about the former today.
We're talking about rewriting or proxying that change.
Common implementations which you may be familiar with are obviously Apache Mod Rewrite and Nginx's own module for doing this.
Hopefully you can make it out but there's two different colors on this screen here.
The first part is purple and this is basically what we are looking to effectively match on.
If the HTTP request path matches this then do this for example.
We're specifying either using regex or just prescriptively we're looking for this path or we're looking for this traffic and then we want to rewrite it or proxy rewrite it to something else basically.
What we've done is we've built this functionality into Cloudflare within transform rules so you no longer have to do this kind of action on premise.
You could do this all within the dashboard or within a purpose-built product and this lives at the top of this new transform rules product here.
When you click into creating a new rewrite action or creating a new rewrite rule what you will see is the ability to modify both the path, the URI path and the URI query and you've got two options with each of these.
The first is preserve and preserve basically means do nothing.
If you had a rule that preserved both the path and the query then you'd basically be just passing that traffic through unmodified whereas what you can choose to do is say for path or query I want to rewrite these values.
I want to rewrite the path or I want to rewrite the query and then there's a drop down box will appear and in that drop down box you will have static and static basically means if I type in ABCD then the query will be on the server or on the origin where this HTTP request is received.
The query will equal ABCD so it's very prescriptive whereas dynamic is much more powerful and dynamic basically allows you to enter almost like a variable or a parameter or what we call a field and this field will on a per request basis be populated with the counterpart so you chose to rewrite the query using a dynamic option and that you chose the country code for example then the query of every HTTP request will be populated with the country code of the visitor basically so dynamic is where you enter variables static is where you enter text to be treated literally.
As I said here you know the first kind of key part of dynamic rewrites is the fields.
These are effectively variables there's some examples on the screen here so ip.geoip.country this will dynamically load the country code so every request that your zone receives will take that source IP will find the country it came from using our database and then we'll load that country code so GB, US etc etc.
The second one is our bot management bot score so if you are paying for bot management or you have bot management on an enterprise account you will have a bot score assigned to every single HTTP request that comes in and basically what we can do is using this field here we can dynamically populate the bot score again within the query within the path within whatever it may be and there's a number of these fields available using the hyperlink on the bottom.
The other part which I haven't spoken about yet is functions and this is where you can use a function to effectively modify again within dynamic rewrites to effectively modify the values so some examples on screen are two string so two string basically allows you to take something which isn't a string data type so an IP address a bot score which is an integer and SSL which is a boolean and convert those into a string which can then be used within the variable so that's quite a powerful one to do basically allows you to just chuck whatever you want inside there within reason and it will convert it to a value that can be used as a header value.
The second one which is worth calling out is regex replace and this is available on the biz and enterprise plans this is where you can basically use regular expressions to to dynamically modify the path generally but also the query or the header value and in this example here we're using regex replace to replace the path and we're doing some matching and replacing what we're matching on with slash effectively and again there's a number of functions here there's one for starts with which using that expression so you can say if my requests start with this or ends with this then again modify the rewrite so there's a ton of these kind of conditional options you can use operators you can use so if you remember those colors earlier where we had the purple and the green the purple being what we're matching on and the green being what we're kind of doing to the things which we match on I've put these kind of brackets around an example rewrite rule to show you them in action so on the left hand side of the screen you have the expression and here what we're basically saying is we're looking for any requests which start with slash secret for example and then we're going to do a dynamic rewrite and we're going to use regular expressions or regex replace function to replace anything starting with slash secret to start with slash hidden instead in the uri path specifically and then when we look on our nginx server in this example here with a custom log format the first output shows you basically what happens with this rule disabled or without this rule in place so any requests sent in to slash secret dot html they show up on the origin as such and they get a 404 because that file does not exist we then enable this this rewrite rule requests come into Cloudflare we rewrite that that request or that uri path part of the request dynamically that then shows up on the origin as slash hidden slash secret dot html and then we get a status of 200 which means the asset was correctly retrieved to to the visitor in terms of where where rewrites run which is always again a kind of key part when you're configuring Cloudflare edge and any product like this it runs effectively immediately after we receive we receive the request so any requests that come in to slash images slash one two three dot jpeg we dynamically rewrite those to slash cdn slash static slash one two three dot jpeg and then any product subsequent from that so firewall rules page rules etc if they're looking for slash images in this example then that will fail because that is no longer the value of the path we've already rewritten it if they're looking for cdn for example then it will match because that's the value which you've rewritten to some cool examples here and we had some tremendous feedback when we launched this a few months ago but you can do things and i've had customers tell me that they've managed to replace you know tens you know 50 100 on-premise rules like apache htaccess rules for example with a single transform rule because you can go from having a big list of you know if it's this country add this path if it's this country add this path and instead we could just say concat so again it's another function that basically means we'll just join or glue various bits together into a uri path and here what we can basically do to concat the country code in lowercase in between uh the slash at the start and the uri path so it'd be slash gb slash index slash whatever or slash us slash index slash whatever so it's a really cool way to have one single transform rule basically take all your visitors and dynamically modify where their traffic kind of goes to one final note here and again this is a common um a common tripping point as it were for customers is url rewrites can only be modified used to modify the path and the query so if you're doing like a dynamic modification of the path or a static modification of the path you shouldn't be really entering https or www or any of the authority in there because that means your path is going to start with example.com slash https for example we're just going to put that value you give us into the path so this can't be used to modify the host name it can't be used to modify the scheme it's just used for the path and query parts as you can see on the screen here the second transform rule is http request header modification um quick recap of what headers are basically they are the core um kind of part of the Internet part of http and effectively they give information to the servers and to the browser about how to handle the interaction so in this example here we have a browser who's saying give me index.html and that will come with a series of request headers and then the origin server or the cloud provider will give that file back along with some response headers like cause for example there's some and you can go through the Internet there's a ton of these there's some really strange esoteric uses for headers in this example here always kind of amuses me and there's a number of customers or a number of enterprises out there who are using response headers to advertise jobs so here you can see paypal in this example has got x-recruiting as a response header and they're basically gambling on the fact that if you're geeky enough to look through the response headers you may well be the kind of person they're looking to attract to work for them in terms of how they're again this is where the colors come in a little bit of color here you can see we've got nginx again we've got apache again and we've also got Cloudflare workers and this was until recently the only real way to modify headers within Cloudflare and each of these these kind of solutions all follows a very similar pattern so again in blue you specify the header name so x-test is the header name and then in the orange color is where you specify what the value of the request header is going to be so again x-test is the header value goes here is the actual value of that header in terms of where you do this again it's within transform rules it's the second accordion down and then within the rule again the look and feel is extremely similar you basically specify what you want to filter on you give it a friendly name and then again you choose either set static or set the dynamic so if you choose set static whatever you type in the value box will be used as the text within the request header so if i chose set static in this example here the value of the header would always be cf dot bot management score which is not what you want obviously so be careful because static will treat whatever you type in there literally dynamic will try and populate the head of value based upon the field you've put in there so in this example here we've created in the blue box blue outlined box a new header called x hyphen bot score what we want to do is we want to populate that header with the bot score from the bot management solution so we've typed in the the field cf dot bot and score management dot score and what will happen is every time we receive or the zone receives a a http request we'll evaluate it if this rule matches which it will because it has true in the expression then we will add a new request header called x hyphen bot score and then what we will do is we will put the actual bot score for that individual request dynamically in that value and what this looks like on the origin again is the top output shows you what happens without this rule enabled so the bot management score is basically null it doesn't exist as a request header and then when we enable this you can see now that the bot management score has been populated and it's given my request 99 as a bot score which is pretty good actually in terms of in terms of where this runs in in comparison to the diagram i showed you about two minutes ago it kind of runs after in quotes than most of the products so request comes into Cloudflare we then rewrite that request it then goes through firewall rules page rules waft and access we then modify the header so here you can see we're changing the value to new val from one two three and then it goes on to Cloudflare Workers unknown to origin so it kind of runs right towards the end so if you want to modify a header and then use that header in like firewall rules and that that won't work unfortunately some cool examples so again we've already kind of shown you the bot score one being able to set the bot score dynamically you can use ip g oip country again so you can have a request header that says visitor country and then every single request has a visitor country code populated there so that's quite cool from a log perspective you could choose to remove visitor ips so if you're sending this traffic to a third party and you're worried about pii or gdpr or one of the others you can choose to remove the visitor ips so your visitor ip isn't actually sent to these parties at all and you can also if you are using again host header modification or host header override which some people are for things like s3 then again you can choose to keep or create a brand new header which is immutable which is based upon the original host value so again that's quite useful to keep the original host as as a field even though you're doing host header override and changing the host field now there's a third transformer which i kind of didn't outline at the start again this was launched in april of this year and that's called url normalization with a z basically the problem normalization tries to solve is requests can come in in a variety of forms and as you can see here all of these urls are the same url they just look extremely different and it says on the slide it's thanks to magic it's not really magic it's it's unfortunately um a very valid reason which is which is ascii and urls can only be sent over the Internet using this character set and what it basically means is if you have a non -ascii character like like a pound sign in this example here or any characters that have a special meaning um then what will happen is it will convert them or url encoding will convert these into a format that can be transmitted um so what this looks like on the bottom is we have an unencoded request and then what's happening here is we've chosen to encode the a both of these are perfectly valid urls and both will resolve to the same location it's just that to the eyeball it just looks different and it can and it has been a bit confusing sometimes when you had a firewall rule that would look for slash login and you received a request that had the l encoded two percent six c then that firewall rule wouldn't work or the rate limiting rule wouldn't work or the transform rule wouldn't work so what normalization does basically is it takes every single http request received if it's enabled on the zone which it is by default for pretty much all users now then what we will do is we'll take all these these random versions of the same request we will knock them into a consistent format and that allows you as the user to write your rules against a kind of guaranteed format so you know that no matter how someone tries to send this request in whether they percent encode this or they put two slashes here or they you know percent encode the a of whatever it may be then it'll all be kind of knocked into the same format so if you write a rule that says if someone's coming from outside my country and they're going to slash login then no matter what they do with that uri path they'll still hit that same firewall rule and that's what normalization solves and in terms of where does it run it is pretty much the first thing and i mean this now it pretty much the first thing that happens so requests come in we then normalize those and we then rewrite those values we then go through firewall rules page rules etc and then we modify the request headers and then it goes on to to workers so we first thing we do before any product which evaluates the url or the url path particularly is we normalize it so any product you have or any rule you have which looks at the path as a filter will be seeing a normalized version if you have it turned on and what this looks like in in real life is effectively we've got two scenarios here so the first one is someone's sending a request to login slash login but they've chosen to percent encode the the l because they're trying to be malicious or they're trying to bypass it trying to bypass the firewall rule and what we have done is we've normalized that to slash login and we have a firewall rule that says challenge when the path equals slash login and what's happened is they've done a challenge and it's gone through to the origin but the origin will still see percent 6c login for example so that basically allows you to have kind of guaranteed controls that your rules will definitely run but it doesn't change what the origin sees so it doesn't risk like breaking api offerings for example which kind of rely on on encoding and this is because there's there's two controls for normalization there's normalization at the edge which is basically do we normalize stuff that comes into Cloudflare and then there's normalization to origin and this is do we normalize the request that goes from Cloudflare to your server or to your your origin your next hop as it were and the default behavior we have for all zones is we normalize what Cloudflare sees but we don't normalize what you send to origin the strong recommendation is you do normalize what your origin sees but there are still a small number of applications out there which will have some problems if they they don't receive the traffic in the exact format they expect it to be received in with the percent encoding for example the url encoding so that's why we chose to do this so your your server will still see the raw value that came in the encoded l in this example but Cloudflare will see l as in the literal l and we will run rules against the literal l on the right hand side what we've got also again a similar example is we've got a firewall rule that looks for slash login and when someone tries to go to slash login we block it and again it's coming as encoded we've normalized it we've run the firewall rule against it and it's it's a match and we've blocked that traffic so that's kind of showing you normalization in action now in terms of how you how you get to normalization within the products like transform rules within page rules firewall rules there's a nudge which is sitting just below your kind of quota indicator and this nudge will tell you in a nutshell whether you have encoding whether you have normalization sorry enabled at what level so here you can see we have it normalized at the age and to origin and there's also a hyperlink there which would take you to the controls and when you click on that hyperlink you'll come to a page within the rules tab and this page has again both controls here so allowing you to turn normalization to origin off in this example and allowing you to turn normalization of incoming urls off as well so they all basically link back to the same page so you can see where these where this kind of functionality is enabled now the other the other part of this as it were is the raw fields are still available to you so if you are concerned about about breakage for example and you wanted to turn it off until you confirm things you can actually modify your firewall rules so if you have a firewall rule or transform rule that looks for request http.request .uri.path you can enable normalization and then modify the expression to look for raw .request.uri.path and what that means is even though normalization is turned on you can still evaluate that raw field so if you want to make sure that you know someone was not typing in percent 6c for example you can still evaluate against the raw.prefix.path or the URIs with with the raw.field and that means you'll basically see what exactly came in to your to your zone so in terms of challenges that are solved by transform rules the key one and kind of the key driving factor for a lot of this is you know the evolution of of customer setups the evolution of custom businesses frankly you're kind of going from this model of the Internet is a way to get to your equipment and then once it gets to your equipment your on-premise kit you then have software that runs there that does your e-commerce for example or your stock keeping or your you know whatever it may be and instead what you're seeing is you know you have Shopify out there you've got Cloudflare sitting in front of Shopify and all of a sudden you don't have this ability to simply fire up an Nginx box on-premise write some rewrite rules write some you know header modification rules because the traffic doesn't get seen by the Nginx box and it does it has to come all the way to your site to come all the way back out again which has latency so we're solving the problem basically because it's not your server anymore the Internet isn't your server anymore what Cloudflare gives you is that control back so now you can come into Cloudflare you can say traffic to my Shopify site or traffic going to Marketo or HubSpot or wherever else it may be I want to rewrite this URL I want to add this header I want to basically bring all those origin or those on-premise actions and I want to do them all in Cloudflare so they're still there it's just not having to run it on-premise anymore I can kind of keep control but in a different level the other angle on this is it reduces your workers bill so again as I said previously lots and lots of customers have been using workers to do some quite trivial things like modifying head request headers and rewriting URLs you know these these kind of common patterns for using workers are the exact kind of thing we're looking to productize because workers are incredibly powerful and they're tremendous at solving problems that can't be solved elsewhere but if you have 100 users doing the exact same thing then that should be something we should be productizing to make it easier for you as customers but also kind of get that use case out there and get other people using Cloudflare to achieve it so we're looking to migrate more and more of these common use cases into into our various rules products as it were the other interesting angle on transform rules particularly header request header modification is using it to enrich your analytics so if you want and most customers have a seam then you can choose to basically add request headers for the bot score you can choose to add a request headers for the country code the locale the edge ip in which it came in on if you're a BYOIP customer and again all of these request headers will be received by your origin and they can then be ingested into your log solution or your event management solution so you can have a dashboard showing you you know we've got a slash 23 of our own IPs on Cloudflare where are all these requests coming into it coming into IPs that we aren't aware of or we shouldn't really be advertising so you can get that insight there you can see what the kind of spread or the gambit is of scores that your zone receives and kind of how that happens and then you can even again choose to take that Cloudflare bot score and evaluate that in your application so you can add it as a request header in Cloudflare and then within your application or within your stack on -premise once it reaches your origin you can say if the Cloudflare bot score is less than x and it comes from this range of countries then I want to send it off to a different pod for example I want to treat that traffic differently so again this enrichment of data and adding more information than you maybe were getting to the requests it allows you to handle that traffic once it reaches your applications much more dynamically and much more kind of effectively and then the last point on here which is again interesting use of transform rules is using it to increase your security and obviously we have things like authenticated origin pull and there's a various other kind of slew of kind of authentication techniques as it were to authenticate between Cloudflare and origin but one of the most trivial ones and a very popular one we're seeing is the use of request headers to set like a pre -shared key so here we've created a static key called my pre-shared key 123 we've typed in some you know long computer generated number as it were and then anytime the origin or the applications receive a request if they don't contain that header with that value which is the pre-shared key then the origin or the application can just discard it so again this will stop requests coming directly to the origin or the application and it acts as another layer of authentication basically to say are you spoofing some properties or are you actually coming from where I think you're coming from and have you gone through security controls which you should have gone through for example so again quite a cool use case thank you for for listening if you have any questions I've put my my twitter handle and my email address on there what I would say is the best thing to do is just try and create some transform rules you know add some headers try and rewrite some urls see what happens and just play around they are incredibly powerful and they are incredibly flexible and there's a number of fields and functions you can use to set up your environment basically to to maximize it reduce your workers bill hopefully increase your security and increase your productivity so thank you for listening you