Delivery Management at Cloudflare
Presented by: Alex Moraru, Omer Yoachimik
Originally aired on October 9, 2021 @ 2:30 AM - 3:00 AM EDT
Join Cloudflare Delivery Managers in conversation with Product and Engineering Managers about their experience in working together to build and ship products.
English
Product
Engineering
Transcript (Beta)
Hi everyone, welcome to Cloudflare TV. Good morning or good afternoon depending on where you're watching us from.
I'm Alex, I'm the lead delivery manager for EMEA at Cloudflare and today we're going to be talking about software delivery management and I'm joined in this segment by my colleague Omer.
He's currently leading the product development, product management for our DOS team and he comes to comes to Cloudflare with over 10 years of experience of building tech products but specifically security products.
Hi Omer, thank you so much for being with us today here.
My pleasure Alex, thanks for joining, thanks for inviting me. Yeah you know you're part of a series in which we together with my colleague Abida we're talking to different stakeholders we have, so different people who interact with the growing function of delivery management here at Cloudflare.
So we've spoken in the past with other delivery management professionals, we've spoken with engineering managers, engineering directors, other product managers but I'm happy to to talk to you today because I think that the case and the relationship we're building with the DOS team is pretty interesting and I'm quite excited to be part of that.
So why don't we start today Omer by talking a little bit about how do we work together in delivery management specifically for you, you know from the product management angle.
Yeah sounds good. So let's start by talking a bit on about what does the DOS team need specifically from a delivery manager currently.
Okay, so for those who are not familiar the DOS team, DOS is abbreviation for denial of service.
The team provides or the team is tasked with protecting or providing the tools and systems to protect Cloudflare's network against denial of service attacks and our customers as well.
Denial of service attacks are attacks that aim to take websites offline and other Internet properties.
So we're making sure that all of those services are available and performant and that cyber attacks don't take them offline.
So that's kind of just a little of a background and from the delivery perspective, well first of all our team is distributed.
We have half of the team in Austin, Texas in the US and half of the team in London.
Even though we're all working remotely at this time, still we're distributed.
So that in itself introduces some challenges.
How do you manage such a team? And in addition to that we're now also in the process of or we will be onboarding a new engineering manager.
So we have to prepare for that as well and because we provide a service to protect websites 24-7 and other Internet properties of course and data centers and so on, this means that our team is 24-7 on call basically.
The engineers take shifts and they respond to pages if there are any issues that need help.
In addition to that because in the recent months we've seen a dramatic increase in the ransom attacks, we've been onboarding very large customers.
What is a ransom attack? So in the past few months, even more, there has been an increase in these types of attacks.
A ransom attack is where a group of cyber attackers such as Fancy Bear, Cozy Bear or the Lazarus groups.
These are, by the way, code names given to those cyber actors, cyber threat actors, names given by different law enforcement agencies and other security firms once they discover their activity.
So what these groups do is they will launch a small attack on a website, on some service, on some Internet property and then this is kind of a tease attack, a demo, a demonstration of their abilities and then they will send a ransom email saying that if you do not pay us 30 bitcoins, usually 30 bitcoins within six to seven days, we will take your website offline, we will attack you with the largest attack you've ever seen and the ransom will increase on a day-to-day basis.
And so these are basically criminals that are extorting businesses. And so we've had a number of those customers, those organizations that have been targeted by those ransom attacks coming for us for help.
I guess it's very scary for them actually to receive this and what's even more important, they definitely need the support to stop this and act really, really quickly because as you said, every day that passes, the ransom attack kind of keeps growing and growing, right?
So you mentioned that in the last few months, I guess, we've seen more and more customers coming to us asking for help.
So indeed, I think we've seen a lot of, well, with the increase of customers, we had to increase our level of support and the time we spend to really configure and answer properly to these requests, right?
So these customers, as far as I understand it, they have this unique traffic profile that we need to adjust for as well so that we can respond to their needs a bit better.
And indeed, what I've noticed as well, stepping into working with the DOS team, indeed they, yeah, we just need to spend more time to figure out how we can best support these customers.
And since we see these things coming up more and more, actually what we're working on and focusing on right now is how do we automate it?
How do we make this process a bit smoother to provide an ever-increasing level of customer support and customization, right?
So that's one of the things we're working on together.
Go ahead, Omer. Yeah, exactly. And it's been, I think, a very beneficial partnership.
And I think the team really values the added value of delivery management coming in and helping us with the processes and making sure that the team is healthy, that the engineers are happy, that we're delivering on our tasks, and that we're optimizing processes where needed.
So I think one of the challenges is that because we've been onboarding customers, very large customers, that require tailored customizations and a very kind of a white glove onboarding process.
And just to kind of maybe explain to our viewers what that means is, so most of these customers we're onboarding, we are onboarding to a service that we call Magic Transit.
And this type of service, we protect entire data centers, not just the website.
So large enterprises or service providers that operate their own data centers with racks of servers and different types of applications running there.
And so we will take their IP ranges and announce them with BGP Anycast from our entire network.
We will track the traffic.
We will be the front line. So if you go to one of their services, you will actually be routing automatically to Cloudflare where we will be filtering for attacks.
And then we will be routing the traffic back to the customer. And so because of the nature of these large customers, we have to onboard them in a kind of a white glove approach where we provide them tailored mitigation strategies that are right for them.
Yeah. I like this term white gloves, right? Because I think this also shows the fact that we're ever evolving our product and our customer base as well.
So now I think we're more and more better equipped to address needs of very different types of customers because we definitely need to be able to respond to how the criminal side of the industry is working, right?
Because these new ransom attacks are totally new.
So we have to really think on our to be able to mitigate them properly.
Exactly. And it really is a team effort because so for our enterprise customers, every customer has a dedicated solution engineer, which kind of covers the technical aspect of the onboarding.
And afterwards, we have the customer success manager that is their go-to person for any issue after the onboarding.
And we have the account executive, which is kind of in charge of the commercial agreements and stuff like that.
And after the customer is onboarded, meaning that we've configured everything and they're in a steady state, that is when also the customer support engineers, the solution of the site reliability engineers and de-escalation engineers and the security operations center and all of those customer facing teams are interacting with those customers and also acting as a kind of mediator between the engineering team and the customer.
And so what happened is that the engineering team now has a lot of touch points with various teams, with various groups in the company.
And that can become a burden on the team because at the end of the day, we want them to develop products.
And so I think this is where the delivery management comes in.
Yeah, and you're right, actually, the way that we define delivery management, well, in general, but also applicable at Cloudflare, a big portion of that role is managing the dependencies and communication with other stakeholders that are not in your immediate team.
So currently, as a delivery manager, my biggest stakeholders will be the engineering manager with their team and the product manager, because my goal is the same as everybody's.
We want to ship products that respond to our customers' needs in order for us to help build a better Internet.
Ultimately, this is what I always come back to.
And the way that we do this in delivery management is trying to help engineering teams organize themselves and having efficient processes and approaches in order for them to be able to balance that product work that you were talking about.
We want them to ship products. Also, to constantly address technical improvements, because there will always be something to improve, a new technology to experience and to complement our existing stack.
And then the third level of it, which I think is very important, and I see very different approaches in different teams, is the team happiness, the team morale.
So you mentioned earlier that we want the team and they feel satisfied when they ship things and when they do the tasks that they committed to and they wanted to do.
And for a lot of teams, what I notice is that somehow this team happiness comes in second, not on purpose.
And this is something that we're trying to mitigate and to address, because ultimately for the long run, if we don't have a happy team, then probably our product will not be as qualitative.
And also, you'll start seeing burnout and you'll start seeing issues with attrition, rapid changes in the team.
So I think that addressing things openly and trying to understand, OK, what are the main pain points?
How can we help? Even having that conversation is very, very helpful.
And why I'm mentioning it as part of the delivery management is because I think having that conversation from, let's call it the third party, so not directly from the product manager, not directly from the engineering manager, can actually be much more constructive because then everybody can contribute to that feedback.
And I think we're starting to do this in the DOS team as well and in most of the teams that we work with, because we have to address the challenges of team happiness.
And what do the engineers want to do? What makes them satisfied as professionals?
How do you see this as a product manager? Because I guess you're always conflicted between shipping new things, but also addressing the rest of the team health.
Yeah. So I really agree with that approach because maybe depending on the interaction or the relationship in general between an engineering manager and an engineer, the engineer might not feel comfortable speaking about the things that bother them.
And so I like that approach of a third party or a neutral, a party that's just coming in to listen, to identify the opportunities for improvement, what can be done better to make sure that we're delivering, but we're also happy.
We want to make sure that, like you said, that the engineers are happy, that they are retained, that they stay.
And there is a trade-off, of course, because let's say that our only goal was to ship as many as products as we can, then that would just create a nightmare for the people working in that team because they would be burdened with endless work.
And that's not... Yeah, they'd never catch up.
Exactly. We need to operate in a way that we can scale and maintain our velocity.
And it's okay if from time to time there is some kind of deadline or some important commitment that we work around the clock to get, but that can't be the constant reality of the team because it's not sustainable.
Like you said, burnout and anything like that.
And a lot of times, by the way, when needed, we've actually delayed the delivery of features.
If an engineer needed some personal time or needed to take some holidays to disconnect, that's, in my opinion, that comes first.
Yeah. I agree with you. And I'm glad that you're mentioning it because I know it might look like from the outside, this is how I see it, and people that I speak to, that's how they see it.
Oh my God, Cloudflare is just pushing and pushing and delivering this.
But you're actually right. Sometimes we push because we know we have an external commitment or this is something we really want to push out.
But I think that we also offer a lot of flexibility on when do we want to deliver something and how do we want to do this.
So for me, an important part of how I see my job is prioritization.
And even more importantly than that, constantly communicating what is the priority, what is the priority, even if it changes.
For some things, we can't really influence much because, as you know, we're sometimes working on initiatives that have a lot of external deadlines because we work either with a customer or with a partner.
And there's a very, very strict timeline for us to be successful.
So then that's what we will prioritize.
And then on the other side, we prioritize things that we know will have a good and high positive impact on our customers.
But having that element of really surfacing the priority for the team constantly, so it's not enough to just say it once, obviously, I think that's also an important part of how we can deliver better.
Yeah, I don't know if you, yeah, what are your thoughts on that idea of prioritization, particularly on changing prioritization and being transparent about it with the teams we're working with?
Priorities change. You know, they say that a plan is only good for the moment it was written.
Things change, right? So the way that we work at Cloudflare, you know this, of course, is that we will have a long -term vision.
So we have the company vision, which is to help build a better Internet.
And then we have the security vision for the security products, which is to protect our customers and provide them the visibility configuration and the tools that they need to stay protected regardless of their size or their sophistication level.
And then we have a step underneath that. We have the DDoS vision, which is derived from all of those, which is basically the same, but to make sure that their website is up and available and to make impacts and downtime from DDoS attacks a thing of the past.
And so that's kind of what drives us, you know, strategically.
And then more tactically, we have our, we can say, our quarterly planning that we do every quarter.
And of course, we have a direction for at least a year of where we want to go a year or two, but that we understand that that's something that can change.
But even the quarterly plan, even after we got a sign off and commitment from other teams for the cross-team dependencies, even after that, something can happen that shifts priorities.
And this is something that we need to be, to acknowledge and be okay with, because that's the nature of the landscape, which we operate in.
And I think it's, it's the right thing as well, because, you know, if I tie that to, to evolution, for instance, those who are able to adapt, survive.
And we want to survive and prosper. And so we need to be able to change priorities.
And, but that also means that we need to be able not just to prioritize features, but to deprioritize features and work.
So we can't do everything.
We can't do everything at once. And if there is something that is, that becomes a higher priority, then something else must become a lower priority.
Yeah. And, and on that, actually, this is something that I'm, I've been thinking about, well, yeah, in the past few months, and I think I'm going to more and more try to iterate on this idea is it's really important to surface what is the cost of something in terms of also adding what is the cost of not doing something else, right?
So, so adding this, this more holistic view of what it costs us, not just from the perspective of resources and kind of the revenue it can, it can generate for us, but even going even deeper in really understanding the impact, the impact it has, you know, on our systems, on other things we're not doing.
So other, other features, which might be helpful for customers, right? So there's always this, this dance in, in prioritization, trying to really understand what's more important.
And it definitely takes many brains to, to make this decision because we need to make sure that we look at it from the perspective of our customers, our product vision, our engineering technology, and, and, and many other elements, right?
So yeah, I think overall, indeed, priorities change. This is a good sum up of the, of the last five minutes of our chat.
That's just how it is.
You have to deal with it. You have to be ready for it. I mean, you know, if it, if it changes too frequently, too rapidly, too extreme, then that can indicate something wrong with the process or with the understanding of what is important and what is not.
But just to clarify, that's not the case. Usually we know what we want to do.
We may have, you know, maybe two top features that we want to deliver this quarter, and maybe that their priority will change a little.
It's, it's rare, I think, that something completely out of the blue will, will come up that we never thought about or never knew, unless there's some kind of problem or issue, which then falls into the category of resilience and stability.
I see. Yeah, yeah.
And, and you're right. It's just that, and I think it's totally fine to, to switch things up, as long as we actually record and track why we made a decision.
So why did we change, not change our minds, but why did we decide to do something else instead?
Because I think it's really important to be able to look back and then understand, was this a good decision?
Was it not a good decision? What are we learning from this, from this step in itself?
And then it also helps because then you are transparent towards people who are involved.
And then if they understand why a decision is made, then it's much easier to get everybody on board.
Exactly. And one of the things that we kind of try to avoid is doing that too frequently, like putting someone on a project and then taking them off a project or having them do multiple projects at the same time, because of context switching.
Yeah.
I recall that when, when you recently, I think interviewed some of our engineers, I think one of the things that came up there is that even if they spend, you know, let's say three hours on this project and then another two hours on this project on the same day, they still had about, you know, one or two hours that were lost due to context switching.
Just to kind of, what was I working on? What was I doing?
Okay. And kind of ramping on and switching, switching gears, which also takes a long time or a significant percentage of their work time.
Yeah. And I think actually we don't speak enough about this, and this is a lesson that I've learned a few years ago in a, in a different job before Cloudflare, talking to engineers about context switching.
And a lot of people don't particularly mind it.
However, what I've learned is that sometimes they are really limited by, by the technologies there they use.
So even if you can switch your brain from one one thing you're working on to another, it actually takes time to change your development environment, to rebase, to, to look at, to look and make sure that you're working on the correct code base.
So actually that time is really not accounted for anywhere.
And this is why actually we do encourage people to just work on one thing at a time.
And I think this is a good life lesson in general, because we can't really multitask, right?
Take things one at a time. Because I think that that can make for a better developer experience in general, and it can make us better, better professionals because it, it will show less friction.
Exactly. And, and it's not, the context switching is a big part, I think, but also the, you know, one of the things that I think we're also trying to do, and that is also one of the roles I think of the delivery management is to shield the engineers from, from various types of queries that they might get from customer support, from sales engineers, solution engineers, from other engineers and so on and so forth.
And I remember that, you know, when, when we first brought you onto the team, one of the things that we saw was that various people in the company would, if someone had a question, they would ping the specific engineers, right?
They would reach out to specific engineers.
And, you know, I know on myself that if I'm working on something, I try to ignore those instant messages and emails because it's, it's such a distraction.
And I'm assuming it's the same for the engineers as well. And so one of the things that I think you, you brought in as well is that we always have an on-call engineer and that someone that has a question or issue or some customer query or whatever would only reach out to the on-call engineer, which they have shifts for.
And I think that was also very helpful as well. Yeah. And I agree. And I think actually it's, it's in human nature, you will want to help your colleague, even if you know that they shouldn't be pinging you directly.
And most people don't want to have to have that, you know, difficult conversation.
You know what, please ping the whole group or the on-call person, because it's typically small things, but as you said, it's a small thing that will even if you fix it in five minutes, it's actually, you know, 30 minutes lost because you need to switch your brain and then back again, and then maybe you will need a break.
So it just disrupts flow. And indeed, one of the ways we actually do this, and we do this with other teams as well.
So in Pyro as well, if there is something that's needed, that's why we also have on-call to respond to stuff that happens.
Sometimes it's incidents, sometimes it's, you know, questions, it's an escalation.
And that's what the engineer is doing for the period in which they are on-call, right?
We have a few minutes left, Omer, and we've been speaking a lot about Cloudflare, about the DOS team and your experience here, but I'm really curious to understand over the next minute or so, having worked with delivery management or similar functions in your previous jobs and in your previous environments.
Yeah, I have. How was that? Well, first of all, I really like it because it allows me to offload a project to some extent and focus more on the product aspect rather than project management.
And yeah, so I've worked in the past, and I think that delivery management or project management, whatever it's coined, is very helpful when they have a significant presence and they are able to kind of take ownership of the project.
Yeah. Yeah, yeah. So let's say you speak tomorrow with a peer of yours in product management who works with an engineering team that doesn't have a delivery manager.
Like, what would you tell them in order to get them curious so that they, you know, ping me and say, hey, what if I worked with a delivery manager?
Like, how would you explain the function, the benefits or the challenges if you felt that they might benefit from this support?
Okay. So I'd start with the challenges because it's easier for me.
The challenges are, I've seen in the past in previous places where delivery management can fall into the category of just scheduling meetings, basically, and taking summaries.
And I think that's less beneficial. I think delivery management should be very, a very, like the handover needs to be very well defined and done between whether it's the engineering manager or the product manager to the delivery manager.
So that the delivery manager has enough data and understanding of the project to make decisions and not just to be kind of a secretary, if you will.
I think that's where the added value is. And it's a force multiplier.
So I think since you've been joining the team, I've been able to do, to get more product stuff done, defining products, working with customers, doing research, thinking about the planning for 2021 and stuff like that, and not dealing with the day-to-day ops.
So it allows me to take a step back, which is very significantly needed.
Yeah. I mean, this is also how I see it. I think a delivery manager, because we have a lot of teams in which you have a good relationship between the product manager and an engineering manager, and they can cover everything that the delivery manager would do, it's just that it would take away from their time of actually doing the job of talking to customers and building products in your case, or they can go take decisions and helping engineers grow their careers in an engineering manager role, right?
Because for me, the way that I define this function is, okay, the product manager will decide on what we do and why.
The engineering manager will decide on how we do this technically.
And then from a delivery management perspective, you look at who will work on what and when, and who else we need to communicate with, and what are other blockers we need to fix something else.
So managing all that, all those smaller how and why and who is actually a very, very valuable thing indeed.
We only have a couple of seconds left, so I wanted to thank you so much, Omar, and thanks everybody who was watching Cloud 13 today with us.
Thank you. Thanks, Alex. My pleasure.