1️⃣ Kubectl with Cloudflare Zero Trust
Presented by: Terin Stock, Tom Lianza
Originally aired on October 5, 2022 @ 5:00 PM - 5:30 PM EDT
Join our product and engineering teams as they discuss what products have shipped today during Cloudflare One Week!
Read the blog posts:
Visit the Cloudflare One Week Hub for every announcement and CFTV episode — check back all week for more!
English
Transcript (Beta)
Hi. My name is Tom Lianza. Welcome to Cloudflare TV.
We're going to talk about Kubectl with Cloudflare Zero Trust and with special guest, long-time Cloudflare engineer Terin Stock.
- Hey!
- So am I pronouncing Kubectl correctly? You're pronouncing it as I pronounce it.
Yes.
So we're either both wrong or we're both right.
So let's start...
maybe tell folks a little bit about what what team you work on and how it fits into all of Cloudflare.
Yeah, so I'm on the Kubernetes team at Cloudflare, which runs multiple communities clusters for our internal application teams, running the background services that power things like our Cloudflare API and other resources that make the Cloudflare network work.
Yeah.
So I'm generally responsible for the control plan and the team's responsible the API.
But the Kubernetes team is considerably more vast than all of that. It seems like no matter what, what team you talk to, they're using Kubernetes in some fashion, either internal tooling, APIs, observability tools, testing things and username space is super prevalent.
So the number of internal customers that your team has is almost everybody in engineering and neighboring teams.
Yeah.
So that that means I think when you, when you all make the world better for those people, you're impacting Cloudflare at large really profoundly.
Maybe we could start with maybe the early days of Kubernetes here at Cloudflare.
How did people use it, access it, onboard it?
What was what's the experience like for somebody who wanted to run something.
So, in the early days, we would have a VPN product that we used internally, and so engineers would come into work in the morning and turn on the VPN and be able to access the Kubernetes clusters that way, which kind of took their, effectively took their laptop and dropped them onto the private LAN that the Kubernetes API server was listening onto.
And so I'm told you might be a little quiet, so you can dial it up or speak more loudly.
Ok, so people would, come to work they'd flip on their VPN and then they'd connect to Kubernetes as if they were sitting in the data center, basically?
Effectively, yeah.
And then, eventually Cloudflare started releasing its Zero Trust or some series of capabilities for us to stop using VPNs, and we started shifting ourselves away from the VPN.
What was the world like then?
Yeah, so first we migrated to using cloudflared and Cloudflare Tunnels.
So you would be able to set up a tunnel on your laptop, engineer's laptop, and that would connect to the Kubernetes API service over a Cloudflare tunnel using Cloudflare Access to authenticate that tunnel.
And that worked fairly well, but it required teams to set up a tunnel for each Kubernetes cluster and configure kubectl to to use those tunnels through SOC settings.
So that was very difficult for those teams to kind of get their work done. We get a lot...
So so from what I remember, we had some teams that had a forked or patched version of kubectl, other teams...
There's a bunch of...
Is there another population group?
to automate switching between tunnels, depending on which cluster you'd selected.
And then..
And that's just for the CLI, the native kubectl And if you were trying to access the Kubernetes API with some other libraries or scripts, you'd be in some other world of proxies.
One of the things we run in Kubernetes is a VM solution called KubeVirt, and they have a CLI tool that did not like running over Cloudflare tunnels, and so it made it a lot more difficult for those teams to access the VMs that they were using for their DEV workflows.
Yeah, so it seemed like we traded off a lot of convenience for an improved security posture where now...
But because in this world, people weren't being plopped onto a LAN, right?
They were no longer given this broad credibility. And then came Zero Trust Client.
Right.
So we've been rolling out Zero Trust Client initially first for the 1.1.1.1 DNS, making sure that all of that is secure and then starting to roll out its other security features such as using Zero Trust network stuff.
And so we've switched kubectl to using Zero Trust so that now the device can authenticate to Cloudflare and have direct connections to the API server at the IP layer, so tools like kubectl or KubeVirt work without any patching or shell scripting.
So this is basically the world that we are in now.
Yes.
And so me as a person on the left with a laptop, I just get to click a button to turn it on the work client, or Zero Trust Client, whatever it's called.
And then you and your team behind the scenes did this, this other leg, the Cloudflare Tunnel leg, is that is that right?
How does that get stood up?
So we run cloudflared as a deployment in Kubernetes.
That allows us to use the Kubernetes replica sets for dealing with nodes going down or maintenance or upgrades of the cloudflared pod.
And that connects to the Cloudflare Global Network, setting up to the Cloudflare tunnels that WARP in Zero Trust can use to send that traffic to the Kubernetes API server.
Got it, okay I think I got it.
So what we're showing here is you have something that configures the tunnel, that tells Cloudflare, please create this tunnel for me between your edge and our API server?
Exactly, yeah.
And then does...
That also configures the Zero Trust Client to take that traffic and send it to the Cloudflare network to be sent over to the API server.
Does anything need to run on the API server at that point or once it's configured, it's just networking?
Nothing needs to run on the API server any different than the normal Kubernetes cluster.
You certainly can run the cloudflared instances on those servers, but we decided to run them as pods in the cluster.
So then those pods.
Did they...
are they long-lived or do they need to always be running for this thing to work?
Or is it's...
Because we have replicas of those pods, again they're daemon set, or not daemon set, they're a deployment.
We run three replicas and each of those pods connects to three Cloudflare edge locations.
So there's a lot of redundancy there.
So if one node or one pod becomes unhealthy, traffic continues without any interruption.
Very cool. What...
so in the end, me as in somebody on a laptop, in the before times, I would have a client that might have some other company's name on it and I'd turn it on and I could access things.
And in the modern era, I have a client with Cloudflare's name on it and I turn it on and I can access things.
What got better?
Like, because I as a client maybe are feeling a pretty similar experience.
From from the user experience, we wanted to have something that for the Kubernetes developers using Kubernetes felt very similar and felt easy to use, but behind the scenes it's entirely different.
Instead of being kind of popped onto the LAN, you have just a single connection to the API server at the IP layer and it's transparent for you.
You don't need to remember to connect to the VPN and you can integrate it with other policies, such as confirming that this is a managed device, that you have the right user roles and things like that.
And I get to use, I get to go back to my native off-the-shelf clients and libraries.
Exactly.
Yeah. It's cool.
That's great.
So most most people then, most customers, just aside from that brief time when we're dealing with tunnels, it's kind of back to the the good old days.
But, so but you absorbed the the work of getting this thing configured and stood up.
Right? What, what went in? What are these sort of, what are you walking people through here on this blog post in terms of what it takes to...
for other people to do what we did with Cloudflare? Cloudflare?
Right.
So, this is not some secret internal ability that we have that customers can't do.
Obviously, customers could do this too, either through the Zero Trust dashboard or by using the configuration management tool.
So in the blog post, we talked about setting up the the tunnel that goes to the GitHub project that we're using to manage Kubernetes, set up the, the, the tunnels, set up the routes to tell the Cloudflare Global Network what IP addresses we want to send down those tunnels.
And then we set up rules for Cloudflare Zero Trust to kind of set up the network traffic to those locations.
And because we use mutual TLS between the clients and the API server, we turn off HTTPS inspection of the...
we use a gateway product, Cloudflare Gateway. Very cool.
So this code, these code snippets that we're showing here are config managed ways that our customer could tell Cloudflare, could configure Cloudflare via a config management tool.
Exactly.
We, the Kubernetes team, we do a lot of GitHubs, which is managing configuration through a git repository.
And so having a tool like this for our configuration management where we can have everything in a text format and be code- reviewed like any other code chains that we would do makes it super simple for getting this out, getting this deployed, and having people review it.
Yeah, makes sense.
Sounds great, and then this must be...
The configuration for cloudflared that's running as the pod, which is these two segments you have on the screen now, and then there should be an example of the actual deployment that we use to deploy the cloudflared pod.
This is the thing that we showed that's, and in the end is running cloudflared from Kubernetes so that Cloudflare edge...
connects to the global network and then sends the traffic to the API server.
Very cool.
And this is, this is what I see.
This is familiar.
Has this required a lot of care and feeding?
Has this... are you basically leveraging Kubernetes to keep keep the uptime high?
How has this been operationally?
Yeah, the cloudflared pod has been fairly hands off since we've deployed it, other than updating the pod as there's new versions of cloudflared.
But we're just utilizing the Kubernetes deployment lifecycle to roll those out and then we just update the deployment and get ops all applied to the cluster.
And then Kubernetes will deploy the new pods.
So it's been fairly simple for us to operate it in all the clusters that we're operating it in.
And then I saw after you announced this and got overwhelming positive feedback from engineers, it started a stampede of them asking for this behavior on every other single thing that they access at Cloudflare it seemed like.
Are there more things that we want to do with this or is it back to a great developer experience?
I think we don't currently allow ingress traffic, so traffic using the Kubernetes ingress controller, over Zero Trust yet.
I think that will be the next big project in further deploying this out to engineers.
So, Yeah, that's right.
And is that with that, is that what would allow an engineer to basically...
they would be testing from their laptop, something that's running Kubernetes more easily, something like that.
Or what's what does that unlock?
Yeah, it allows them to connect to those ingresses.
So we have kind of internal tools that are running in Kubernetes that expose an ingress through the internet controller that is difficult for engineers to connect to.
Sometimes they're just like...
All right then.
So ingress we might will likely still do.
And after that, the days of people creating will be friction free.
That is the goal.
Yeah. All right, that's, that's excellent.
Is there any, is there any feedback we want to give to our colleagues in Zero Trust about this?
This....
It's a self service with the public docs.
Now, how did you find it?
Yeah.
This was pretty much deployed based on public documentation.
We had to do some updates to the configuration management tool to use new API endpoints that it didn't support yet.
But after we vetted that, it's been pretty self service and fairly hands off from needing to work with that team directly.
That's great.
So in doing this, we made our own config management provider better. We have a question from a viewer.
How is the DN...
How is the DNS name of the API server complicated or not by the Zero Trust Tunnel?
And secondarily, does this obviate the need for an API server load balancer?
So how we've deployed our API server in cloudflared and the Zero Trust, we still need a load balancer in front of the API server.
That's because we're using a quote, "public IP", although it's not routably on the internet, it is from a public CIDR.
That is to aid in backup configurations so that if you are not able to use Zero Trust for some reason, you would still be able to have the same config file locally, the same kubeconfig and still be able to access it if you're going through kind of an out-of-band channel.
However, this would also work for routing private IPs, so you would not need a, necessarily need a little balancer in front of the API server.
So if you're just running this as a smaller cluster, that'd be fine.
So when you mention the not accessible, are you talking about like in the event of an operational incident or something that there's another path?
Right?
Especially since part of the control plane for Zero Trust is running in the clusters that you would normally access through Zero Trust, there needs to be a backup method to connecting to them.
But instead of having multiple configuration files and having the operational overhead of switching kubeconfig files in that scenario, it's easier to just use the same hostname and IPs.
Very cool.
I can't believe we got a viewer message.
I've never seen that before.
I don't know if there's any other questions.
Is there anything else?
Is there anything else we should cover on this product?
All right.
I think we've gone through part of what we discussed in the blog post.
And this is running in in all of our clusters right now.
And team seemed super excited when we announced it internally.
Yeah.
I thought I was going to set records on likes. It may have. I haven't checked recently, but well thank you Terin.
This was such a success at the company I know.
And it's just a really good story around Cloudflare using Cloudflare for the benefit of Cloudflare, but it's also a story that any of our customers could use for their own companies, which is a great addition to the Zero Trust Week.
All right, thank you, everybody, for the time and the questions and we'll leave it at that.
Thanks, Terin. Ciao.
Everybody should have access to a credit history that they can use to improve their situation.
Hi, guys.
I am Tiffany Fong. I'm head of Growth Marketing here at Kiva.
Hi, I'm Anthony Voutas, and I am a senior engineer on the Kiva Protocol team.
Great.
Tiffany, what is Kiva and how does it work and how does it help people who are unbanked?
Micro-lending was developed to give unbanked people across the world access to capital to help better their lives.
They have very limited or no access to traditional financial banking services, and this is particularly the case in developing countries.
Kiva.org is a crowdfunding platform that allows people like you and me to lend as little as $25 to these entrepreneurs and small businesses around the world.
So anyone can lend money to people who are unbanked.
How many people is that?
So there are 1.7 billion people considered unbanked by the financial system.
Anthony, what is Kiva Protocol and how does it work? Kiva Protocol is a mechanism for providing credit history to people who are unbanked or underbanked in the developing world.
What Kiva Protocol does is it enables a consistent identifier within a financial system so that the credit bureau can develop and produce complete credit reports for the citizens of that country.
That sounds pretty cutting edge.
You're creating, you're allowing individuals who never before had the ability to access credit to develop a credit history.
Yes, a lot of our security models in the West are reliant on this idea that everybody has their own personal device.
That doesn't work in developing countries.
In these environments, even if you're at a bank, you might not have a reliable Internet connection.
The devices in the bank are typically shared by multiple people.
They're probably even used for personal use.
And also on top of that, the devices themselves are probably on the cheaper side.
So all of this put together means that we're working with the bare minimum of resources in terms of technology, in terms of a reliable Internet.
What is Kiva's solution to these challenges?
We want to intervene at every possible network hop that we can to make sure that the performance and reliability of our application is as in control as it possibly can be.
Now, it's not going to be in total control because we have that last hop on the network.
But with Cloudflare, we're we're able to really optimize the network hops that are between our services and the local ISPs in the countries that we're serving.
What do you hope to achieve with Kiva?
Ultimately, I think our collective goal is to allow anyone in the world to have access to the capital they need to improve their lives and to achieve their dreams.
If people are in poverty and we give them a way to improve their communities, the lives of the people around them, to become more mobile and contribute to making their world a better place, I think that's definitely a good thing.
.
My name is Justin Hennessy.
I'm the VP of Engineering at Neto.
Okay, so I understand Neto is an e-commerce platform based in Australia.
Tell us a little bit more about it.
Neto is a omnichannel sales platform for retailers and wholesalers.
So essentially what it allows us to do is enable the retailers and wholesalers to sell their products in multitudes of sales channels.
Tell us about the importance of automation in your business.
I came onboard as the lead automation engineer, so I think automation is key to anything in this day and age.
Like if you're not looking at ways to automate the low-value work and then put your people in the high-value areas or high-leverage areas, I think you're just going to get left behind.
So as a technology company, obviously, it's critical for us to make sure that automation is at the core of what we do.
When did Neto begin working with Cloudflare?
So in the beginning, when Neto was looking to migrate from an old cloud provider, we also wanted to improve our, what we call our go-live flow or our onboarding flow for merchants.
And a big part of that was obviously provisioning a website, a custom domain name, and a custom SSL certificate.
Requesting and getting granted that certificate in the whole process took two domain experts full time.
It was a very lengthy and technical process, which took, you know, could sometimes took up to 2 to 3 weeks.
So you can imagine, you know, a customer who's itching to get online, that kind of barrier presents a pretty big problem.
So what Cloudflare enabled us to do was to literally automate that onboarding or go-live process to almost a one click process, and it also allowed us to diversify the people that could actually do that process.
So now anybody in the business can make that, you know, set a customer live with a very simple process and it's very rapid.
So that's where we started.
What are some of the security challenges you face in your business and how are you managing them?
Any online service has to take security very seriously and it needs to know that security is job zero, so we always bake in thinking and process and tooling around security.
So what Cloudflare does for us is literally gives us a really good protective layer on the very edge of our platform.
So things like DDoS mitigation, Web Application Firewall Protection, all of that obviously is then translated into a really solid base of security for all of our merchants as well.
The security is obviously front of mind for Neto as a business, and online e-commerce presents a lot of security challenges.
So denial of service attacks, cross-site scripting.
have automated attacks that are trying to find exploits in our forms and our, our platform generally.
So prior to having Cloudflare, obviously we had measures in place, but what we've gained from Cloudflare is a consolidation of that strategy.
So we are able to look through a single lens and we can look at all of the aspects aspects of our security for the platforms.
And I think it's probably safe to say that now more than ever, a good online strategy is crucial to success.
80.