Latest from Product and Engineering
Presented by: Jen Taylor, Juan Rodriguez, Angie Kim, Constantin Britcov, Nick Comer
Originally aired on June 23, 2022 @ 1:00 AM - 1:30 AM EDT
Join Cloudflare's Head of Product, Jen Taylor and Chief Information Officer, Juan Rodriguez, for a quick recap of everything that shipped in the last week. Covers both new features and enhancements on Cloudflare products and the technology under the hood.
English
Product
Transcript (Beta)
Hi, I'm Jen Taylor, Chief Product Officer at Cloudflare and I'm joined today by my illustrious co-host Juan.
Hi, I'm Juan Rodriguez. I'm the CIO at Cloudflare. How are you?
So we're another fun-filled installment of latest from product and engineering.
I'm super excited to be doing this one with Juan. You know, one of the joys for me at Cloudflare is I get the opportunity to partner with lots of different parts of the engineering organization, lots of the different types of engineers we have here.
We're actually spending time today with one of my, honestly, my favorite teams and that's our billing team.
Can you guys just go ahead and introduce yourselves?
Sure, I'll start. My name is Constantin. I lead the billing team.
I'm the billing manager. Hi, I'm Nick.
I'm the billing team's technical lead. And I'm Angie. I'm the billing product manager.
Cool. All right. So stepping back, Cloudflare ships all sorts of craziness to all parts of the world.
We have a globally distributed network in over 200 cities servicing over 25 million sites and applications, blah, blah, blah, blah, blah.
How does billing fit in? What does it mean to do billing at Cloudflare? Well, billing is a pretty centric team.
So what we do is we enable customers to order products through our dashboards.
We make sure that all the service is properly enabled and everybody remains happy.
And we certainly try our best not to overcharge anybody for the services.
So that is what we do on a day-to-day basis. But besides that, of course, there's a lot of technical engineering challenges that we'll talk about today.
But it's a pretty centric team with focus on a lot of areas that Cloudflare has, a lot of other products that we offer.
Angie, I know we've got our drop your credit card down business.
We've got our contract customers.
How does billing fit in? And how does billing work with those different solutions?
Sure. So in terms of our credit card customers, the ones that are the pay-as-you-go, the ones that are going to our dashboard, signing up for services, entering their credit card, like Constantine had alluded to, we have a whole slew of workflows that we have supported through our dashboard.
So customer goes in.
They're going through the different products. They go ahead and select it if they want to order it.
Then there's the actual charging of the customers.
So then once that has been successful, then we'll go ahead and actually enable service.
And then there's the actual invoicing. And then after we send out those invoices to customers, then there's revenue recognition, all the fun stuff that accounting does.
So that's pretty locked down in terms of at scale. When it comes to our enterprise customers, it's a little bit different.
There's a lot more hand-holding, a lot more customizations that involve just because there's contracts and negotiation that happens, different SLAs.
In the case of enterprise, it's much more manual, which means there's a lot more hands-on that are throughout the process.
And that's definitely an area of improvement that we can have to help our sales organization.
Let's go. One of the things that I wanted to mention is the team is just talking about purely billing and the part about charging and things like that.
But actually the term billing for this team is a little bit of a misnomer because they also do all these things that have to do with entitlements.
One of the most fun parts of the platform. So Nick, maybe you can talk a little bit about what is entitlements and the product features and things that we also enable the customers to actually access the things that they have bought.
Absolutely. Yeah. And billing, there's two parts. You pay for something and you get it.
We own both of those parts. We take the money and then we also give you the service you pay for.
So when it comes to enabling service, that is, I would consider another half of our platform internally at Cloudflare.
So how it works usually is that product teams build their own products and they build usually two parts.
They build the configuration plane and they build the data plane.
The configuration plane exists in one of our core data centers to where customers can come to our dashboard, turn the information out to the edge.
So when it comes to the actual configuration plane part and customers are trying to load balancers, that service needs to know how many load balancers a particular customer is entitled to as a result of some billing, whether they have signed a contract, swapped a credit card, it's all the same.
And it's the way we abstract the configuration plane services from the matters of billing to the point where they can simply ask our parts of our platform how many balancers should they use.
If you've purchased load balancing and you've added a bunch of load balancers, you'll be able to add up to a certain amount.
And this is a internal configuration plane of that particular service into parts of our platform.
So what this ends up looking like architecturally over the entire is a pretty big hub and spoke model where everything is calling into this one sort of hub, as it were, in order to determine what is entitled to a customer.
So what we end up needing to do with that service is deliver a very low latency, very high throughput service that is currently putting through roughly order of 1500 requests per second.
And so downtime is unacceptable. And low and high latency is unacceptable.
So it turns into a really fun challenge. Because at that level, you end up needing to consider everything at every single layer of an application in order to deliver those guarantees.
So it's been a huge source of technical challenge, which I think is some of the funnest parts of my job.
You touched a little bit in the comments you just made, Nick, about just the sheer scale that this team works with.
And when I step back and I think about, as Nick was just talking about, some of the things that really makes working on billing fun.
It's really sort of the power and the scale of what we do.
Nick talked a little bit about entitlements.
But how do you guys think about the challenges of scale? And I think of scale in a couple different dimensions, right?
I mean, it's like the scale of the volume of customers, the scale in the portfolio of products we have.
I mean, how do you guys, how do you tackle that? How do you tackle those different dimensions?
I think I'll say something about the fact that what I like about Cloudflare and specifically about billing at Cloudflare and talking about scale, the scale of having to manage so many products and be in the know of how things function and how we're supposed to bill for them and provision service.
I would say billing is a team where we rely on a lot of communication.
So proactive communication, a lot of planning that at times we hit and at times we miss and we have opportunities for growth and improvement in those areas.
So that's one of the funnest parts of, I think, all of our job is to grow and be aligned as obviously we put out more features and we move fast.
One of the great things about Cloudflare is we're a public company, but we still very much believe in transparency, moving fast and failing fast and learning from it.
So that's another experience you can get really close and personal with when you work in a team like billing.
But when it comes to scale of our APIs and the amount of requests per second and high availability, all of those things like Nick alluded to, we have other APIs that we're responsible for and there's a lot of engineering challenges there and accountability that goes on.
So I think, Nick, you can talk a bit more about, I guess, scale when it comes to other services and how we manage to stay available and stay reliable in terms of maybe subscriptions, API and other things that we heavily rely on as well.
Great. So back to the mental model of what our platform looks like, you have the billing end and you have the enablement end.
What I was just referring to earlier was more of the enablement end, but the other half of that equation is the billing end for which we have this subscriptions API that is accessible through the dashboard as well as the public API.
This is a complex domain.
So when thinking about scale and thinking about solving for scale, you can say things like, oh, this will never happen because it's one in a million times and you don't need a plan for that.
But on Cloudflare scale and at the scale we operate, one in a million happens all the time.
So you have to be good at the what if game and you have to be really good at bringing the simple solution because complexity will find you.
And when it does, it will complicate what you have. So you want to bring as much simplicity to the system as possible.
And that's usually done with simple tools.
So at Cloudflare billing, we use Go, which was built from the start and designed from the start by Google to be a simple language.
And it's proven to be as such.
When you look at our code base, even though the domain is complex, the language isn't.
It's not going to make any sort of fast moves on you.
It's not going to do anything unexpected. And so for that reason, we like Go.
And then Postgres. We use Postgres databases because Go and Postgres work pretty well together with the right driver.
And it forces you to create a simple system.
And between those two tools, we've been able to scale out very large things that are also very simple.
And like the entitlement service I referred to is pretty much just those two things with a little bit of light caching with Redis.
But you can do a lot with those things. And yeah, you want it to be simple because billing will make it complicated.
So be ready for anything. And don't bring complicated tools to the workshop, I guess, or else it's going to turn into a really sorry story.
Maybe also what candidates a lot of times ask about when they think about joining Cloudflare billing.
And I speak with a lot of those folks.
They ask us, what is your technical stack? And to Nick's point, it's a very simple, in my idea, it's kind of a layered system of Postgres, of Go, which is the backbone of it all.
Kafka, I think. So event-driven architecture is something that we definitely use a lot of at Cloudflare, and I think in billing as a whole.
And Kubernetes, which obviously allows us to orchestrate a lot of containers and all of our services.
And the cool thing about it here at billing is that we don't necessarily rely heavily on an outside team to control our uptime or to control our system's health.
We can monitor everything ourselves. We can be accountable for everything ourselves on call and make changes that we need pretty fast.
So from that standpoint, it allows us to also be nimble and handle scale when we need to, when it goes up or anything else where we need to hopefully not react, but plan for as the time goes on.
Go ahead. Sorry. Go ahead. So in addition to the tech stack and all the infrastructure behind it, there was also a lot of work that the team had done to allow different teams to be able to operate more as a service.
So there was a lot of work intentionally done so that we could make things more self-service, so that there weren't such strong dependencies on billing every single time, a new feature or a new product needed to get rolled up.
So we've opened it up to the different teams.
They can go ahead and make their updates to our catalog.
And then with the PR, we can go ahead and approve it and it's out there for use.
Which I think is so cool. I mean, the fact that you guys are dealing with such massive scale, such intense requirements and uptime, but at the same time, able to think about how you innovate as a platform and the such.
The other thing that I think is really interesting and something we talked about the last time you guys were actually on was really, again, we can't take the billing team offline to swap out a component.
We can't take the billing system offline to do anything.
When the billing system's down, the service is down.
And so I know recently, Angie, the team just made a swap in the payment processor.
I'm kind of curious first just to understand a little bit about kind of how did we start that journey?
What was the driver for us to make that decision to take on?
Because we don't take on those types of changes lightly. Yeah. So what we're always trying to do is look ahead into the future, right?
So we want to put ourselves in a position where we can anticipate and different types of changes and different needs are coming so that we're prepared and ready to move forward with it, as opposed to hitting a last minute requirement and then everything has to stop and we have to figure things out.
So looking forward, today, we have customers all over the world.
We support PayPal payments as well as credit card billing.
But that's really limited in terms of what customers are actually using.
And in thinking forward, again, we want to be able to support multinational currencies, non-US dollar, and something beyond just those two payment methods because, again, we have customers everywhere.
So in order to be able to put ourselves in a situation where we could support that, we had to swap payment processors.
Obviously, doing that overnight is not easy. And so what we did was try to take the most tactical approach in terms of what is the least amount of risk.
And that means if we were to onboard a new payment processor, that means we start with new customers first, right?
So they don't know what the existing experience looks like.
This is completely new and they go through the flow. It also lets us kick the tires as well to make sure that our integrations are in place so that we're not disrupting our existing customers by the fact that we switch something mid-billing period, and then all of a sudden they can't pay and then they get downgraded in service.
So there's a little bit of a rollout in terms of which customers do you address first versus risk.
And then the next milestone that we have to solve which is the biggest and most difficult is migrating over existing customers, right?
So like I said before, we don't want to disrupt their service. We want them to allow to continue to buy more product as they're going through that migration process and try to time it with, is there a potential outage?
What's going to be the impact to other launches that we have?
So it's like a dance. You got to coordinate all these different things, figure out the right steps to get to the end goal that we want.
The thing that I wanted to add as well, and you mentioned a little bit about this gen, that in billing is maybe a little bit different than some of the other engineering teams in Cloudflare is that we have a ton of stakeholders, right?
I mean, normally a lot of the product engineering teams, they may have one or two stakeholders that they work in parallel, maybe it's the Dashboard team or maybe products that are in parallel, but we're kind of like, we have like almost like fingers in everybody's pies a little bit.
But one team that is also very important that we collaborate very closely is the finance team, right?
So all these people that we bill with, then we have all these interfaces to backend systems around for revenue recognition, for all those sort of things.
And for something like, you know, just to elaborate on what Angie mentioned and changing the payment processor is, there's a lot of processes that also tie with billing, you know, around that abuse, fraud, refunds, all those sorts of things.
So, you know, to even add complexity to these things that Angie was mentioning around the dance of all these pieces, it's also the coordination and training and processes that we need to change, you know, with all those teams that now they need basically be able to have access and know how to use the tool in for, you know, this new payment processor as well.
And, you know, it's not simple, right? I mean, it's just basically you got to bring a whole bunch of, you know, people along with you versus just like, you know, replacing a component, you know, in a technical stack.
So it's like, you know, there's a lot of that like that, that, you know, with bringing a lot of teams together with us every time that we make one of these changes, that is just a little bit different from all the, you know, the other engineering teams that we have in Profler.
Yeah, and change is hard to coordinate with so many people because they're so accustomed to things being done this way, and then to disrupt how they are doing something.
And this, you know, billing is not everybody's job.
It's just a small aspect of it. And it, you know, it brings some level of discomfort to what they're accustomed to doing.
But in the long run, as long as we've got everybody on board, and we've got that buy-in that things will be better as a result of this, I think in the long run, everybody will be excited.
There's going to be discomfort up front. But again, trying to get everybody aligned, I think it's a good thing.
Yeah. So Nick and Konstantin, again, like replacing the engine mid-flight, because I love that analogy for this team, just given how critical the work is.
Like when you guys sat down and you looked at, you know, replacing or adding the additional payment processor, what are some of the things that you guys had to factor in?
Well, we had to factor in the very, very first thing that Eugene mentioned, which is billing cannot be offline.
Billing cannot just stop working.
We have to continue taking orders. We have to do it well.
And the cutover should be seamless with no pain to the customers. The scale, you know, how many customers we have, being able to coordinate efforts with all the other teams to make sure that all the other products that we today put out are going to continue being available, orderable.
We have to work with a lot of other vendors or systems we depend on, you know, our billing backend that we have, our previous payment gateway, plus the existing now new payment gateway to Stripe.
Just a lot of coordination that takes place. And that goes back to a previous point about how things are not just technically at scale and billing, but more importantly, I think, from the standpoint of context switching and things running in parallel in terms of projects, in terms of initiatives.
A lot of that has to be tightly looked at and planned for.
I don't know if you want to add anything to that, Nick, from a deeper technical perspective, but those, I think, are some of the more considerable challenges that I see.
Yeah. Just back to the theme of scale, there's rule number one in these sort of cutovers and migrations of the sort is no big bang cutovers.
Everything needs to be very incremental, one step at a time.
That way, you take one tiny step and something goes wrong, you can take a step back and nobody gets harmed and revenue streams don't get impacted.
So, planning from a technical perspective, how to balance two payment processes at the same time, knowing which customers go to which and how to keep that straight and make sure money keeps getting collected, new customers still get onboarded.
All that is the first thing to plan for because without that, there's no world in which we would simply cut over everybody at once.
That's dangerous, risky, and is just a recipe for absolute failure.
We have absolutely failed if a customer gets to our dashboard, wants to buy something and is simply unable to.
We do not let that happen.
The whole system and the problem being asked to solve is mostly not just make this new system work, it's create a system that can work with both things in a very interim window while we are cutting over, which is much more complex than one thing at a time, but it's worth it in the end if customers can still come to us.
Like somebody that we know that likes to say it, we have new logos probably for our billing t-shirts, which is cut many times or measure many times and cut once.
I almost put it the wrong way.
See, that's bad on me. Measure twice, cut once. I didn't measure enough before I said that.
Many times measure and cut once. There you go.
Yes, exactly. I failed there, but that's it. But hold on a second. Juan, the other thing you guys are focused on is helping teams measure faster.
A big portion of what this team is really engaged in right now is as we scale, making the whole engine of the process of getting these customers live and on board faster.
What are some of the things you're thinking about and looking at as you're thinking about automating and scaling that?
As Angie mentioned, a lot of our pipeline is what we call our pay-go part of the business.
The self-service part is pretty well automated.
We have a process, people sign up, they can buy things on the dashboard.
It all basically happens on the background fairly successfully. The enterprise side of things is where some of our challenges are in terms of automation.
As Angie said, many things are more like handcrafted, which it's almost like every contract in some cases is a little bit of an artisan piece of art at times.
That is also something that is growing, the part of business very fast for us.
What we want to do, basically one of the big focus on this year is provide a similar amount of automation that we have basically in our pay-go pipeline into that business.
We can do trials for customers at a scale, whether it's existing customers or new customers.
They call their customer success manager and they say, hey, I would love to try these products, see how they work on my environment, or if it's a new customer.
It's easy for an executive to basically have an opportunity, click a button, and then a trial basically gets provisioned to that customer.
The provision automatically when it happens today is more of an annual process.
That's a significant goal that we have for this year that and a significant driver of efficiency and also quality of service.
Because again, every time that you have humans provision things manually, you drive also you have potential for errors there.
The other thing also that we're trying to do is we haven't talked a lot about variable billing and usage-based billing and things like that, but in the enterprise out of the house also, that's something that we'll have customers that have caps on usage and things like that.
We're trying to provide a billing backend also for the enterprise out of the house that can drive a lot of automation on the calculation of those rate plans, the usage, and then all those things that is something that basically can be billed automatically to customers without humans having to look at that, and also can get connected to our finance system.
It's almost like bringing all those superpowers that we have in the pay goes out of the house in the enterprise out of the house.
As we scale, and you mentioned this, we're not thinking about what is going to happen tomorrow, but we're thinking about, okay, we have this amount of customers, we're doing this amount of transactions, what about when we're doing 10x that?
What are the things that we need to put in place so we can have operational leverage basically at that level of the scale?
Those are some of the big goals that we have in this year for the team.
That's exciting. We just have one or two minutes left, but Angie, I wanted to come back to you because I know recently the team has had a bit of a big win.
We spent a lot of this call talking about the scale, the infrastructure, and the complexities, but a big portion of what this team is doing is helping Cloudflare innovate on business models and try new things.
Do you want to hint to folks what might be coming their way shortly?
This sounds like a very simple request, but it's taken, I've been with Cloudflare for almost five years, so I'm sure it was asked for way beyond that, but annual plans, being able to buy a plan for a full year.
It's been a long, long journey with that effort, but I'm happy to say that it's out there in terms of early access, so people might be asking, why did it take so long to do this?
There are some little things along the way that needed to be figured out, so trying to commit for a month versus a full year is totally different.
One of the things that can happen is that mid-period, somebody may say, I don't want to use this product anymore, and with a monthly plan, before what used to happen was we would cancel service, so you pay up front for the full month, and in the middle of the period, you cancel, and then you get downgraded immediately.
In the case of an annual plan, you can't do that because you're prepaying for a full year, so not too long ago, we supported capabilities to do delayed downgrades and cancellations so that you continue to have access to what you had paid for until the remainder of the period, so we couldn't offer annual plans until that got solved for.
We also have situations where, because we want to have plan differentiation, there are different features that were embedded on the different plans, so for pro and for biz, we've got products like imagery sizing and spectrum, so there's dependencies in terms of usage that way back when we didn't used to have, so it just adds further complexity, and we need to figure out how do we solve for that so that we are not generating a bill for full years of usage a year later, and it's a surprise, so there's some stepping stones that we had to get to the point until we could actually solve for that and provide a model in which we could provide an annual billing for customers.
It's fantastic. I know it's been a long -standing ask, and to your point, it's been something we've been thinking about internally for quite a while, and it's interesting because it's kind of classic product innovation for me, kind of the simplest of things on the outside often take significant investments internally, so we are just about out of time, but I did want to pause and acknowledge that today is a special day for one Mr.
Nick Comer, and I was going to suggest that we sing, but I will spare him the embarrassment of all of us trying to sing concurrently on Zoom because I think the noise cancelling would kill each other, but Nick, I just want to wish you a happy birthday.
It's a pleasure to get a chance to work with you, and you're older and wiser today, I guess, so happy birthday.
Happy birthday, Nick. Happy birthday, buddy.
Go honor me. And as always, time flies. I can't believe it's been a half an hour.
I could keep going with you guys forever. Thank you so much. It's a total pleasure.
I really appreciate the work that we do together on this, and look forward to talking to you all again soon.
Thanks. Thank you. Have a great day.
you