Cloudflare for HR Tech
Presented by: Boris Yanovsky
Originally aired on September 24, 2021 @ 4:30 PM - 5:00 PM EDT
Cloudflare has an interactive Hiring Dashboard on the People Team that sits behind Cloudflare Access. This segment will demonstrate how we integrated Access login credentials deep within our custom data app, enabling us to render dynamic dashboards tailored to the user.
English
Transcript (Beta)
Hi everyone, my name is Boris Yanovsky and I work in people analytics on the Cloudflare's people team and welcome to another installment of Cloudflare for HR Tech.
So this will be a follow-up, we're going to kind of build on some concepts that I shared in the last segment which talked about how to build secure dashboards and put them behind Cloudflare access which is a product that Cloudflare, which is a Cloudflare product that we use internally on the people team.
So I kind of showcased some I guess some best practices and some use cases of how we secured our data dashboards behind access on the people team and this time around I actually wanted to dig a little bit deeper and talk a little bit more about some more advanced use cases of how you can basically leverage access in a little bit more of an advanced way that you can share dashboards that are more custom and tailored to specific users based on their credentials and their logins through access.
So we'll talk a little bit about that this is going to be more of a case study instead of kind of like a how-to.
If you're interested in how to actually get this done I do have a website landing page that I created where you can find some resources and it is at hrtech.cf.
I'll have some links here to the site in a little bit but I wanted to just basically talk at more of a high level here and just kind of give you a conceptual review of how we went about this and some of the problems that we ended up solving which enabled us to basically evolve and elevate our analytics on the people team and share data broadly but also share it securely so that the decision makers that we need to make you know have basically important context for making decisions around the data would actually have access to it and then people that don't need the data would be restricted from that access.
So here's a little overview of how plots and dashboards typically look and I know this meme sometimes is used as a satire for the highest level of achievement here in the Galaxy brain but I actually think this part this flowchart sort of kind of makes sense to me in a data evolution or a dashboard evolution process.
So you first get your plot on your local machine and it looks great.
It's a static graph called a png whatever you want however you want to save it you can send it an email and you can share with people you can print it you can screenshot it do whatever you want with it it's great but it's static I mean I think that stuff's very useful and you can make it very pretty if you are if you have an eye for design and then you're you know able to format all of these different visuals great that's that's very useful still.
Next step is and we actually get this ask a lot from stakeholders is dashboards and you know dashboards can be defined in many different ways but really the way that I think about it as a dashboard is just a collection of different plots and different metrics that are kind of all in one page on like the same landing page so they can have multiple pngs for example so great now you've made you know 20 pngs and then you put them on this dashboard and everybody can have access to an overwhelming amount of data so sometimes the data can be just a little bit too much you can go overboard with these but that's what you choose to do great you have still data at somebody's fingertip.
The problem again is that it's static so you can't really share this you can make live dashboards and put them on the server so that is the part that I talked about in the last segment about how to take these dashboards and then essentially share them on a domain well on a server and then share them through a domain that people can access them.
One of the risks here is security because now you're actually exposing this data to all of the Internet so you want to make sure to go through very deliberate processes to actually not expose it and only show it internally and I talked about that in the last segment so for all the fans that tuned in for the last one thanks for tuning in if you haven't you can go back I think some of them are playing on repeat and I do believe we'll actually have these for kind of on-demand streaming and I'm not sure what the timeline is but you can go back and keep checking on Cloudflare.tv to see when we're actually going to put those on there.
So once you get this dashboard live on a server it can be very sleek and very powerful because it can be dynamic you know people can filter data they can go to your url and you can design it however you want again if you have an eye for design if you know a little bit of html or css you can really do a lot here and really make it powerful.
Now that is where Cloudflare access comes in because we can essentially block off we can block off access to this url to this domain to only people who you would like to restrict it to you can be as granular as for example locking it down to just specific email addresses or you can be as broad as saying only lock it down to let's say ips from certain countries or certain regions or let's say domains from certain companies and certain emails like let's say you want to only restrict it to google .com to gmail.com so you can do that as well so essentially this kind of acts as a replacement for your vpn and it's instant because it sits at the application level so you don't have to log into any any vpn to get into it it's right in front of the application and all you need to do is just have the identity have your identity provider that logs in through it and it can verify that that is indeed your email and that is into your identity and it lets you into the dashboard.
So what I'll talk about here today though is something that's a little bit that takes us a little bit deeper to the next level and not only is this showing not only is it showing you live dashboards but it would also show you custom and secure dashboards based on who the user is.
So because we will be using access on these dashboards one powerful thing from this is that we are able to consume a json web token that actually has a lot of useful information from the user who logs in and we're able to use that web token to decode it, take the information of the user, find out who the user is and then render a custom dashboard that is tailored to that user and to whatever security group that you want them to be in whether it's a security group that you want them in or whether it's a specific user who you just want special access to them or vice versa if it's a user who you don't want access to but maybe you want to show them maybe you want to let them into the actual dashboard but let's say only want to show them a few of the plots so you don't want to restrict access to the entire thing to the entire domain but you want to restrict access to a couple of different elements in there and maybe even furthermore let's say you just want to filter it down to specific people based on who logs in.
So the first two kind of parts here the galaxy brain you got the PlotPNG dashboard you know these are usually done on your local machine you can do them on your desktop you can use any application any language that you want.
I will be talking in terms of the R language and using Shiny specifically for the dashboarding.
The Shiny is a package in R that lets you build interactive dashboards.
I believe it's javascript based so you can actually integrate it with javascript with css and also with html so it's very it's very it's quite powerful and I'll actually show you how we're using Shiny to consume these web tokens and then rerun and refresh these dashboards within the Shiny server to render them back to the user depending on who the user is.
And then these live dashboards that I talked about in my previous segment they are they're very similar to the previous dashboard but it's something that you can actually share with an audience.
They're built on a virtual machine typically and then they are delivered through some kind of either an internal domain or an IP address if I wouldn't recommend that but if you want to keep it internally you can potentially do that as well.
And then here is the ultimate galaxy brain of course we have a secured customized dash and I'll show you kind of at a high level of what that looks like in a second here.
So a quick review of the previous segment.
I think this is helpful to understand the kind of foundational work that's needed before you can actually get to this next level of the evolution into the galaxy brain.
So you first have to have some kind of a virtual machine whether it's on gcs as your digital lotion you need to install your environment on there so whether you use docker whether you're just installing R Shiny or R Studio or whatever it is then you can build your R Shiny app on that virtual machine.
You build it on there you can test it you can you know basically troubleshoot it do whatever you need and then you can share that with others internally.
So when you share them with others this is a step that you have to be careful and because this is where you can actually expose it to the entire Internet so before sharing it with others make sure that your domain that you're sharing through is either what was behind Cloudflare first of all behind Cloudflare access but also that your origin server is locked down to only allow certain IP addresses potentially from let's say from within your domain if you have secure data on there and in the case of the people team in the case of any kind of company data or really any data that you work with I mean you really just should be careful and just you know be I guess like treat this as kind of as a Zero Trust security model that any individual that logs in it doesn't matter from what domain from what organization whether it's within your organization or whether somebody that's external you have to treat each user as a potential threat because you know that you really can't verify what the user is and then after you secure app you can share the profit so that's when kind of the magic I guess happens that's when the payout happens is when you actually share the app out with people they can log in they can use it and potentially make decisions from the important data that you share and you know now great you've enabled a bunch of people to make some very powerful data-driven decisions.
So if you want to review this like I said you can go to hrtech.cf and it actually will ask you to verify through access so you can see how that workflow functions too if you're interested and I have a bunch of resources on there I have a demo app on there as you can check out.
So our use case here in today's case study is that we have access set up we're restricting access to specific individuals let's say it's people from Cloudflare.com or yourorganization.org or acme.com whatever any company let's say that you have access restricted just to those individuals and let's say you also have access to partners outside of your organization.
The nice thing about Cloudflare is that we use different methods of identity management and single sign-on so you can use things like google i think well you can authenticate through google you can authenticate through okta you can use linkedin you can use facebook you can even just use a token that sends it to any email address that is in the list and if the email it checks against the verified email address list that you can upload for verified users and it will send a token for a login code that then you can use.
Once the user is in what happens is all this information kind of gets encoded in a json web token about the user and you get things like their email their name that they used to log in and that potentially can then feed into your custom web app and that is really where i would say that that's you know that's where the powerful feature at least to me the most powerful feature of access is because this token can be taken into your custom app in our case it's shiny so a shiny app or a shiny server sits kind of in its own environment you know you can you can launch the shiny server on your machine and you can access it as you're working in your development environment in your sandbox or on your desktop wherever it is but the cool thing is that if this shiny server sits behind Cloudflare and people log in through access the shiny server can then consume this web token this json web token and once you decode it you can get some really useful information and really i mean the most useful piece of information is name and email address in our case and then you can basically then use that info in your app to route a analysis process or to route any kind of you know you can you can route through if statements you can filter it by that you can use it as a subsetting variable you can use it to render dynamic graphs images you know content whatever it is it's actually really quite powerful so let me show you a very simple example of how how we kind of think about it and like a very high level of what you could potentially do at just like a very a very simple level here so let's say that you have a dashboard for things like employee head accounts in each department and this dashboard is secured behind access of course and only only individuals in your company let's say all employees in your company are able to access this dashboard because it's just head count information there's no specific details in there and the headcount information is broken out by department so anybody can come in here verify through google or through whatever authenticator that you're using and they can get in here and see this view for example so you have a view of headcount and engineering headcount and hr headcount and sales now let's say you wanted to make this dashboard dynamic so that's people can come in here and they can drill down deeper into the headcount within these specific departments so now you're getting a little bit more private data you know you might not necessarily want some like any any employee sales to go into the engineering department and drill down into headcount by specific managers for example or by specific teams because potentially that is information that's private only to the engineering team let's say that's the case in in our scenario but you also want everybody to have access to this high level summary headcount so this is when this json web token becomes crucial because what can happen is let's say that your director of engineering at acme.com logs into this dashboard and let's say that they click on the engineering column once they click on that column shiny can you can use our shiny and like i said we use our shiny here we use plotly for for these i guess they're like d3js graphs that are interactive so you use plotly for that which makes the all the elements in the graph interactive so if you click on the bar then below that you can actually render subsets within that higher group broken out by whatever subgroup that you care about so in this case let's say you wanted to look at the team breakout in engineering and you're the director of engineering so you should be able to do that so you go in here you click on engineering and it shows you the headcount within 120 or whatever it is people there and how they're distributed across the three different teams in engineering so you have engineering team a team b and team c so cool that's very useful you know the director of engineering can also go in and then click on maybe engineering team a and let's say they can see the headcount by manager or by location or tenure whatever it is that you care about whatever it is you can build out in your app you can show that now let's say that's joe at acme.com somebody who's not in engineering um let's say it's a ic level employee who's not a manager maybe they shouldn't be seeing um these detailed headcount reports let's say that they go into the app and they can see all of the company breakdown overall so they can see engineering hr and sales but when they click on engineering what happens here we don't want them to have access so we can actually have this custom routing within our app that filters it only to the people that we want access to and let's say director of engineering is on that list they get the view on the left let's say joe is not in that list it can error out and we can input custom error messages um you can actually have you know when we built this very quickly we've just left the default error message which is just a red error that says error um filter not available or some kind of cryptic message that you know scares people so we change this to make it a little bit prettier um you know you can have a message that like sorry did not have access to this data um and you can be a little more detailed but the point here is that it's able to identify who is it's able to identify who the individual is that's running this session so the most the most important kind of like where actually really where the magic happens is in this area i kind of want to focus on this for a second now so i'm going to go back to this overall view of how access works and the overall flow when a user logs in when they can verify their identity and then then they can go into your web application based on whether they're allowed into the site or not and the portion that is crucial here is the passing of the decoded json web token into your actual web application once this gets passed into it and you decode it it essentially floats as a value in the application so when i when i talk about this i'm thinking in terms of r and r has objects that you can assign values to and you can store them in the actual environment so when the web token gets passed in you can break it down decode it parse it figure out the email address that's in there there's a bunch of i'll show an example in one second um but it essentially comes in a huge list that's actually quite messy to deal with it are um unless you can just um build some you know custom functions that can deal with it which i suggest you do if you end up using this over and over again because the structure ends up being the same so if you build a function um it quickly it quickly parses it so once you parse it and pull the email address out you can store that into our environment and continue filtering based on that user and the way that that happens is you can um you can read the headers and you can also read the encrypted token which i'll show you in one second here um that has a lot of information um like i believe does have information in the url that you came from um and might have information on well it has information on time as well which you may or may not care about um it potentially could be good for logging to see when people are accessing your app but the one that we use is either pulling information about the email from the header or pulling the information from the email um from the actual body part of it um which is the decoder part so let me show you a quick um of how this looks in practice so if you go to hrtech.cf so if this is your first time at hrtech.cf um it'll ask you for an identity and to verify your identity and there's two ways to do it um you can either use gmail um above the button or you can enter your email below and then it'll send you code um i believe i have it set on everybody's allowed right now you just have to verify your identity um so if you can't get in um send a question or email me let me know um i will set this up after this talk actually just to make sure that it does work because i might have i might have restricted it as i was kind of uh toying around with it in sandbox and a couple resources here on how to get a virtual machine in digital ocean how to build shiny apps and virtual machines and how to share those apps how to set up Cloudflare access and then you can actually get to this sample app right here which is under dash tv demo and this is just a very sample you know very basic made up data set here that actually does use plotly i believe for interactive dashboarding this one doesn't have the detailed breakdown that i was telling you about where if you let's say that you wanted to click on the headcount breakdown on may 1st 2019 of 133 people this is where you can use the web token and access and click on this and if you are in a certain position let's say your executive team of the company you can click on this so let's say that you're the hr team and you'll be able to render a more detailed report here of another breakdown that could be um customly built in your app and if you're another employee outside of that department if you click on this it you can actually set the behavior so i know in the slide here previously i used an example of sending a message just saying like sorry you don't have access to this data or having an error around somehow but you can actually set to behave in or to not exhibit any behaviors basically so when you click on this and you're not in this group you can kind of just cryptically quietly ignore ignore the request ignore the click so i just got a five-minute warning all right um so i let me show you how the header so actually one of the most complicated the more complicated things about this for me was really thinking about how this um how the information is stored in Cloudflare access through the web token how we can communicate with a shiny web app because in my experience working with um working with shiny and working with custom dashboards which i guess i've just built them in shiny um i've never really had to have shiny communicate with anything external like a header or a token coming in from from not just from the domain but i guess from Cloudflare access which is acting as a proxy for your identity management so that was a little bit confusing to me because it seemed very nebulous it's kind of like there's a these two kind of things happening and they're not really talking to each other and i don't really understand how to make them you know basically pipe or inject information from one side to the other and um our wonderful solutions engineering team ended up helping me out with this by talking me through um giving me some um basically teaching me a little bit more about what access is what the web token is and how it gets consumed by Cloudflare access and then what you can do with it so with their guidance um i basically was able to find a um um it are in our script on github that is able to read headers um and in those headers is the json web token so let me show you how that looks actually um um so i'll totally give credit to this um awesome human being who made this um repo i'm not sure what they use it for but in my use case i only have to use a small snippet of code from there and it was just to read this session info where i can get a bunch of information from here um and this is all decoded so i would essentially take this and so right here i was able to actually parse the email address and display back to myself just for testing so i can take all this information that's in the token assign a value to it and then basically have free reigns with it within the app so really at that point the only limitation is um i guess your ability to build within your app your custom web app your application so whatever application you're using whether it's um i mean whether you're building a dashboard in python um i'm not sure what other languages you could use i'm assuming javascript because it does have very cool d3 graphics um or if you're just using r which really was more designed for data analysis um i i think this is this is actually this is just really interesting because um the r language wasn't really designed for this kind of you know web development or you know like almost network engineering work but with Cloudflare access it actually makes it very easy to consume these things that are kind of like way beyond my level of expertise and knowledge and i'm able to bring it into kind of more my world or uncomfortable um which which is the auto language and then work with that um and uh ensure that you know they talk integrate smoothly and that all the data on there is uh safe and secure and it works just how you intend it to work and with that um i i know i have less than a minute left now so i'll say thanks for attending um i don't know if there will be a third session but definitely come back to Cloudflare.tv and see if there's downloads of this if you're curious on how to get this done yourself and then feel free to get a hold of me if you have questions as well.