Cloudflare at Cloudflare
Presented by: Juan Rodriguez, Larry Archer, Sam Rhea
Originally aired on March 16, 2021 @ 1:00 AM - 1:30 AM EDT
Come learn how we use Cloudflare technologies internally to solve problems (or as we say "dogfood our own products" internally). Today we will talk about Access and Argo Tunnel in our environment.
English
Transcript (Beta)
Welcome everyone to the first Cloudflare at Cloudflare session. My name is Juan Rodriguez.
I'm Cloudflare's CIO. And in this session, we're going to talk about a term that we use inside of Cloudflare called dogfooding.
And we strive as an IT organization in general, the different teams to be Cloudflare's first customer.
We try to use our products and we normally think that you know if they're not good enough for us, they're probably not going to be good enough for our customers.
So I'm going to be hosting every week a session on how we're using some Cloudflare technology to solve internal problems.
And I'm going to have some guests to talk about those.
So this week I have with me Sam Rhea and Larry Archer. Sam, why don't you introduce yourself?
Yeah. Hey Juan. I think it's still good afternoon out there, but it's good to see all.
My name is Sam. I'm the Director of Product Management at Cloudflare and I'm based in our Lisbon office.
All right. Larry, how about you? Hi, I'm Larry Archer.
I'm an Engineering Manager for the DevTools team here at Cloudflare and I'm based in Austin, Texas.
All right. And I'm in the Atlanta branch of the Austin office for people that may want to know.
So today we want to talk about how do we use a couple of products to solve some internal problems that Larry has as the DevTools Manager.
And Larry will talk to us a little bit, you know, in a while about, you know, his environment and his internal customers and all those things.
But for people that are joining us and may not know what access may be Oracle Tunnel, maybe I'm going to have Sam talk a little bit about the product, what it is, you know, do a little demo just for background.
So I'll pass it to you, Sam. Wonderful.
Thank you, Juan. Well, the first thing is, like you mentioned, making you and Larry happy and several other members of IT and Cloudflare security is my team's first goal.
And what's so fun about that is, like you alluded to, when we are able to listen to the problems that you have and build solutions that you and your teams get to be critical in a friendly way about, once we get them to be fantastic for your use case, it typically can apply to a lot of customer use cases.
And our access product is, I think, one of my favorite stories about that workflow.
Cloudflare, several years ago, like a lot of organizations, was reliant on a VPN to connect to internal resources.
So to reach things that power our business, like our own JIRA or our own Grafana, the internal tools we use, you had to fire up a VPN client and connect through a VPN appliance back in San Francisco and deal with all the hassle that that produced.
And the Cloudflare team started thinking there has to be a better way, because this is a problem both for usability, it's a problem for the VPN clients and maintaining them, it's also a security headache.
I think it's fair to say, probably, Sam, that a VPN is one of those things that everybody hates.
At least I have never spoke with any other company that says, like, we love our VPN.
Yeah, everyone agrees they want to get rid of their VPN. The hard part is finding an easy way to do it.
The nice thing about Cloudflare, in terms of finding an easy way to do it, is we've long been in the business of deploying our network in front of sensitive things.
It's just that, for the most part, those were sensitive things that you wanted to protect from DDoS attacks or the kinds of threats that our WAF blocks.
But with something like your internal resources, what you really want to protect is identity.
And adding that into Cloudflare's network is where access emerged.
So we asked ourselves, how do we give Cloudflare's network a bouncer who stands at the door and checks for identity?
And the nice thing about this bouncer is they can stand at each and every door of every application, check for ID before letting you in.
And for end users, it feels like a SaaS app. I'm going to do a quick two-minute demo.
Juan, is that all right with you? Absolutely. Thank you.
We always love demos. This is a very real live demo, so all the caveats about that.
But I'm going to show an example of what was one of the very first services that Cloudflare put behind access, Grafana.
Which, if you're not familiar, is an internal charting tool, reporting tool.
People use this to say, did that pager duty alert I just received really mean something is having a hard time?
So it's a great tool that our teams use, but it's also something that you want to get to really quickly in the event of an emergency.
Is that good? Absolutely. The first way to think about how access works, and this is the access dashboard part of Cloudflare for Teams, is when we deploy this bouncer, we want to make sure this bouncer knows which door to guard.
And the way we do that is we start by telling it about an application.
So here we'll call this Grafana demo. And we give this application a subdomain, so it just feels like any other SaaS app on the Internet.
So we can say grafana.demo.widgetcorp .tech, just a little test domain I have here in my personal test account.
And the way that we inform the bouncer, like, hey, these are the individuals that are allowed into this establishment.
These are the individuals who are not, is by giving it a rule and saying, in this case, only Cloudflare.
And we're only going to allow people who have an at Cloudflare.com email address.
And what's nice about access is that if we were working with partners or contractors who were using Grafana, we could have our team members log in with our own identity flows while they maybe log in with LinkedIn or GitHub.
We could add rules like this to this flow, but we're going to keep it simple for right now.
And if we want, we can use any number of identity providers for those contractors and team members and partners to use.
We're going to rely on Google, but we will leave these here.
So I'm going to go ahead and add that application. So you'll see here grafanademo.widgetcorp .tech.
We've now built an access rule. We've added an identity check into Cloudflare's network for this subdomain.
And what I need to do next is I've got Grafana, you can see here, running locally.
This is something that, again, would have been on the VPN for us to reach this.
I would have had to fire up a VPN client if I could even figure it out on a mobile device if I was in a real emergency.
So I need to connect this to Cloudflare's network.
And I want to do that without opening up any firewall holes, opening up any ports to the Internet.
I want to find a way to securely connect Grafana to Cloudflare.
And we have a product that handles that for us, something called Argo Tunnel.
And Argo Tunnel consists of a really lightweight daemon that powers a connection between your application, your server.
It can also be applications you connect to over SSH or RDP or other protocols from that application to Cloudflare in an outbound only connection.
So what I'm going to do here is run Cloudflare d tunnel.
I'm going to give it that host name that we set up earlier.
I'm going to tell this little daemon where to send traffic to. Localhost 3000.
You'll see here that's the address of Grafana that's running here locally. We're going to go ahead and run that.
And what's happening is I'm here, like I mentioned earlier, our Lisbon office.
This is connecting to our Lisbon data center and our Amsterdam data center in a secure outbound only connection.
And now as an end user, if you remember this application that we've now built a rule for and created a tunnel for, when I go to this as an end user, I'm presented with this login page.
And I know that happened really fast. I'll slow down. You see the URL here.
And connecting to that Cloudflare's network, that bouncer is saying, hey, wait a second.
Only at Cloudflare is allowed to reach this. Can you prove who you are? And I'm given options to do so.
I'm going to pick Google for now. I'm going to pick my Cloudflare email address.
And what's happening is as quickly as you just saw at the edge of our network, I logged in with Gmail and I was redirected to Grafana.
And so now I'm actually connected over the Internet, but through this secure tunnel from the origin to Cloudflare and with that identity check at the edge of Cloudflare's network, and Grafana feels like a SaaS app for me now.
That is so cool.
And obviously, you know, if you were, for instance, in where I'm from originally in Spain, in Madrid, that authentication will be happening basically in our pub in Madrid, right?
Access runs in workers, which means that the authentication is distributed at our edge and also really, really, really fast.
Yeah.
That is awesome and so very cool. Thank you for that demo, Sam. And now we're going to go to Larry.
Great. So Larry, you know, has as most engineers, a whole bunch of extremely demanding, demanding engineers that, you know, want things to go fast, they want to access the tools as quickly as possible.
They hate things like VPS that add friction to access like repositories or the wiki or like, you know, Git or anything like that.
So, Larry, tell us a little bit about the story of how this whole started and, you know, what we're trying to do and how we started to use Access and Argo Tunnel to solve some of those problems.
Absolutely. So, yeah, before I was the engineering manager for the DevTools team, I was an engineer on the team, or what would later.
So you know about the demands, right? Yeah.
So, yeah, a lot of this came from just wanting to try out our own products and to help the rest of the company have easier access to our products.
So I started a little over three years ago and this push to get away from the VPN start was there from the beginning for me.
I remember in my orientation class, we were all sitting there in the classroom in the basement of our San Francisco office in the club level and we're all trying to connect to the VPN so we could get on the wiki and see what's in there.
And we were running into the limit of the seat licenses that like user accounts for the VPN.
So they were telling people, okay, if you're not using it right now, go log off so that the new hires can get in and get to stuff.
So, yeah, we felt the pain right away. Aside from that, like I think Sam mentioned, not being able to, for example, use the wiki or Jira if you're on mobile, like on your way home or something, like check on a ticket or a wiki page was kind of painful for a lot of people.
Yeah. So I think it was about a year after I started, there was this push to like, we got to get everything behind access, including, we started with the wiki and Jira for the stuff that our team was responsible for.
And it was interesting.
It was like, okay, we got to figure out how to orange cloud the domains, the host names that wiki and Jira are on.
Larry, for people that may not know what orange cloud means, what does that mean?
It means we put it behind our orange cloud, behind Cloudflare.
So rather than serving the traffic directly from our data center through the VPN, we put it behind Cloudflare's edge network so that it gets served that way.
So we had to work with the SRE teams, the people who had access to the dashboard account, the Cloudflare dashboard account that had these host names in it so we could control that, had to figure out how to get it on the, get the servers and everything configured.
And it was an interesting process and it, the access part of it, I think went pretty smoothly.
It was like figuring out how to work with our config management system and get things on there.
But once it was done, the response from users within the company, not just engineers, but people that use the wiki and Jira for anything else within the company was really neat.
It was, they really appreciated not having to use the VPN.
So like, it's possible to install the VPN client on your phone and log in and go through the two factor dance on the phone to get in there.
But it's clunky. I don't know of anybody else besides me that's gone to that trouble.
So like being able to like comment on Jira on your commute home or something was...
And I also assume, Larry, that, you know, for instance, even in onboarding that we used to do locally in San Francisco, you know, the VPN concentrator is very close to us, but, you know, most of your customers, you know, all those like development managers and engineers are all over the world.
So that connecting, you know, from Lisbon or Singapore or whatever, all the way to San Francisco for VPN was less than ideal, right?
Yeah. Yeah. And there are VPN servers in other locations to make it a little easier, but still it does slow things down.
And we like troubleshooted VPN issues.
And usually the response is, why don't you just use the edge instead? It's much faster, which worked out great for a lot of the tools that we use.
We did run into a few, and I think this is the great part about dog booting.
We found some rough spots and were able to work with the access team to get those improved.
And some of them were things they had already identified as things that were in progress, so we got to test them out beforehand.
So, for example, we have services inside our data center that need to talk to JIRA and Wiki and Confluence through those REST APIs.
And at first, there was no easy way to allow those services to talk to the API without hitting an access and being blocked by that access doing its job, basically.
So, we set up bypass addresses, we call them. Separate host names, so that like, okay, here's an address you connect to if you're going to use the API from within a certain range of IP addresses.
And then after that happened, since then, they've added a bypass policy feature within access.
So, you can say, if it's coming from these IP addresses, bypass access.
You don't run into that.
And then there's also service tokens that have been added since then as well.
So, you can give your service account its own token and it can connect through access if you're able to rewrite the code or adapt the code to connect with a special header or something.
So, that's one example. I guess the other one was when we first put JIRA and Confluence behind access, you would log in through access through your identity provider, and then you'd get presented with the Atlassian login page as well.
That's no fun. I have to log in twice just to get to the wiki.
So, we worked with a developer to build a plugin for the Atlassian tool. So, that's JIRA and Confluence and Bitbucket so that once you're logged in through access, this plugin that runs within the tool reads the JWT from access and says, okay, you've passed through access and this says that you're so and so.
This is your name, this is your email address, and then it automatically logs you into the Atlassian tool as well.
And I assume that this plugin that we developed for our particular use case is something that also is now available for customers as part of access?
Exactly. It's available on GitHub. It's an open source plugin that we've released.
So, if anybody out there wants to put their JIRA, Confluence, Bitbucket behind access, this plugin is a great help.
And again, the response from our users internally was like, oh, thank you so much.
I don't have to log in twice now.
I remember one day when we turned that off to just test something. Exactly. I knew that it was going, as a product and as a plugin, I knew this was going to work out when we turned that off and suddenly chat exploded.
People said, wait a second, I have to log in twice?
What did you do? They had to log in twice. Once you've made something so much easier, it's hard to… It's very difficult to take that back.
There's a similar story with the, I think we call it instant auth.
So, if you only have one identity provider configured, it doesn't present you that login page that you saw as part of the demo.
It just says, oh, you're going to log in through Google, so we'll just send you there.
And we turned that off accidentally one time and everybody's like, such a pain.
I had to click the button. I had to click another button.
It's like, yeah, I get that. It's faster. It's more efficient. It's great. So, that was Jira and Confluence.
We also have a whole bunch of tools that our team is responsible for and many, many internal tools.
It's kind of the same story. Every time one gets behind access, it's a little bit easier.
It's one less reason to use the VPN.
For our team, I think the next big one that we did was Bitbucket, which is another Atlassian tool where we host all our source repositories, our Git repos.
And that was a little more challenging because it's not just a web interface.
It was Git operations over an SSH connection. So, again, we got to work with the Access team and the Argo Tunnel team to build this thing that allowed us to put the Git and SSH connection behind Access as well.
And it uses the tool that Sam was showing earlier, Cloudflare D and Argo Tunnel.
So, that runs on the server, but it also runs on an engineer's laptop with some special SSH config.
Once you try to do a Git operation that's going to our Bitbucket server behind Access, it pops up in a web browser window and says, oh, you need to log in first.
It's that bouncer saying, you can't do a Git operation until you log in.
So, you log in, it saves a token, and then everything else is just you can use your command line tools without being on the VPN.
And it's just that much easier. It's pretty nice. Yeah, so through all this, it's been great to be able to try these things out as our customers are using them or even before our customers are using them.
And then be able to say, hey, this is a little hard to use.
Can we make this easier? A lot of the times the teams are like, yeah, we thought of that.
We're working on it. You can test it out as soon as we have something ready for you.
And I think that that is one of the things that, you know, why, you know, Cloudflare has, you know, many things from a culture perspective that are fantastic.
And one of them is this basically, you know, what Larry just walked us through is basically dogfooding at its best.
Where, you know, even the beginning of a product is something, you know, for a problem that we're trying to solve internally.
In this case, it was like, you know, all the pain and friction that we had, you know, with a VPN.
And then along the way, you know, when we deployed internally, it's basically we continue to find, you know, specific issues.
Some of them may be, you know, not everybody or not every customer is going to have, you know, a professional software engineering team or as large as what we have internally with all these tools.
But, you know, for many customers, basically what happens is a lot of that flexibility that we built.
And, you know, some of these use cases, you know, like for instance, this Atlassian plug and stuff like that is something that then, you know, they can leverage for something like this or other particular use cases that they may have.
So, and yeah, so everybody knows.
So one of the things that we've been working over the, you know, probably for a while, as Larry said, we've been trying to basically get rid of the VPN inside of our cloud for as much as possible and put everything behind access.
And that has actually made us learn tremendously about many different use cases.
Right. And we're not completely there yet. But I think of basically being able to totally turn off the VPN, but we're getting very, very close.
And hopefully in another Cloufler TV episode at some point in time, I'll be able to talk and share the news that we've been able to basically get rid of the VPN completely.
So, one of the tests for that for our team is, so we do a DevTools bootcamp or orientation session for every new hire class.
Yes. And there's a list of things that you need to install and try out and I think we're at the point now where we can say you don't need to bother with the VPN client, all these things now should work over access, you should be able to hit them over access.
So it's like our CI server, Grafana, like Sam showed, our internal app repository, our Docker registry, all have access in front of them that engineers can use to get their work done.
In fact, I mean, as part of the laptop setup for everybody, we actually don't install VPN by default anymore.
It's more like from an exception basis if a particular role requires a very specific access or something, that's something that we still don't have behind access.
But as I said, it's more like the exception and not the norm.
So that's pretty exciting. It's been such a valuable feedback loop for us on the product and engineering team as well because one thing that has occurred, and this just happened this week, was there will be a certain group within Cloudflare that kind of dogfoods before the dogfood.
So they're rewinding a few months, cube control.
So the ability to control Kubernetes cluster was one of the last holdouts.
Yeah, because of the technical challenge of sending that through and ensuring that was authenticated through access, sending that through a network.
And what was really kind of, it feels like how it should feel because we all sat down in a room with leaders in DevOps, leaders in engineering, leaders in SRE.
And they said, this is what we want. And the definition of done was really high.
It was until from a group that really cares about this tool for good reason, until we're excited about using it in this flow, we won't consider it done.
And we said, great.
And what was also wonderful was they were willing to help us kind of navigate some of the nuances of it.
And we were able to release this support for cube control.
And then just this week, there was a chat from one of the leaders in the engineering organization wasn't part of that original dogfooding, who used it for the first time.
And said, wow, this is really slick. And what's so special about that is because it's, it's slick as a combination of the expectations and the encouragement, the feedback from experts at Cloudflare, as well as the hard work from the engineering team to kind of solve that problem in a way that everyone was happy with it.
Now, there's other people at Cloudflare experiencing it for the first time, and understanding what the customer experience feels like.
And that's really special for us. It also gives us, when we talk to customers, we want to make sure that the problems we're working to solve, the problems they have, are things that we're speaking from either experience on or some level of credibility.
And Larry, you may or may not remember this, I sure do.
Probably at least a year and a half ago, there was a customer who had a, an Atlassian deployment very similar to ours.
And it was when we were very new with access, but the SSH flow, the plugin, and Larry was kind enough to hop on the customer call.
And I remember the room on the fourth floor of the office, we just moved in the fourth floor.
And Larry was able to say, you know, hey, I'm responsible for this at Cloudflare, I understand the challenges you're having.
This is what we did and solve it and how we use access.
And I think really demonstrates something within Cloudflare about empathy for the problem that we're solving.
You know, we're not just trying to throw a solution at a problem we suppose the customer has, we're able to bring in someone who felt that same problem in their own way at Cloudflare and talk about why the things we built work the way that they do.
And that's been, as a member of the product team, it is something I'm eternally grateful for.
Just that level of feedback, those feedback cycles, and the credibility we can have when we talk about these problems, as well as the humility we can have when we say, you know, we ran into that too, and we didn't know what to do.
And this is where we arrived. We hope it's helpful. Yeah. Yeah, as I can tell you, having, you know, been for many years, you know, problem solver internally, right?
I mean, as a responsible for IT and providing services to internal, you know, internal customers.
And being able, you know, instead of just reading like a white paper or documentation and things like that, then, you know, speaking with somebody that actually is had a problem similar to yours, you know, exactly what you said, some would like a stack similar to yours on how they're using it, how do they thought about the solution, how it is integrating, how it is operating.
There's nothing more valuable from my perspective than that.
Yeah, I think that's fine.
You know, this is one of the things that I always say, you know, with access, that is an absolutely game changer from my perspective is that, you know, you're not only solving like, you know, a complex problem, right, from an access perspective, identity management, and things like that, but you get to have your cake and eat it too, because you provide an amazing experience to your internal customers.
I mean, you see, I tell everybody that I spoke with, every time that we move a solution behind access, I mean, you get like a ton of kudos for your customers.
And, you know, as an IT provider, I mean, that is, you know, not that common, right, that we're like, you can say, wow, you know, we're combining all this stuff from a security perspective, solid discovery first.
And then on top of that, you know, we're getting basically gift baskets, you know, from the users, you know, how amazing, how much better we've done their lives.
Well, this was a great session. Thank you so much.
I'm not sure if we have any questions. Let me see. We do. So, if we can.
Well, thank you so much for your time, Larry and Sam.
Maybe we'll have another session at some other point in time to, you know, for another for another Cloudflare Cloudflare and have a great rest of the weekend.
Thank you everybody that that joined us for this live broadcast.
Thank you, Juan. Keep the future requests and the feedback coming.
Yes, sir. Thank you so much. All right. Thank you all.
Thanks.