💻 Reimagine What's Possible With Serverless: Stateful Edge Applications Become A Reality
Presented by: Greg McKeon, Evan Weaver
Originally aired on May 15, 2022 @ 10:00 PM - 10:30 PM EDT
Building stateful edge applications has traditionally been a significant problem when developing serverless applications. However, that will no longer be the case.
In this Cloudflare TV segment, Evan Weaver, CTO, Fauna, and Greg McKeon, Distributed Data PM, Cloudflare, will discuss the newest ways to build stateful applications at the edge and how it will reinvent what's possible to build with serverless computing.
English
Developer Week
JAMstack
Transcript (Beta)
Hey, so I think we're live now. So thanks to Nick and Allie for a great segment before.
My name is Greg, PM for Distributed Data on the Workers Team here at Cloudflare.
And my guest today with me is Evan, who's the CTO at Fauna and a co -founder.
Hi, Evan. Good to see you. Thanks for being here. Yeah, thanks for having me.
Thanks for coming to talk about the new partnership between Fauna and Workers.
So I want to kind of give a little background on what Fauna is to start things off.
And I know the idea for Fauna came to you while you were actually working over at Twitter.
I think you were employee number 15, came on as like Director of Infrastructure.
Is that right? Yeah, I worked at Twitter in 2008 through end of 2011.
So I was there for four years. And Fauna is a serverless data API, or like a sort of operational data fabric, you could say.
And it came out of our experience building the storage systems at Twitter.
Like you said, I ran what we called the software infrastructure team.
We built all the distributed storage for the core business objects.
That was tweets, timelines, user storage, image storage, the cache, the social graph, some other storage, I'm sure I forget.
But those are all point solutions.
We used off the shelf storage engines like NDB and Redis and Memcache and modified them and built sharding services in front of them.
Because we couldn't get anything off the shelf as a distributed data system overall that could really scale a product like Twitter.
Twitter didn't have any segmentation in the network.
It was a soft real time system. And there was nothing like it at the time.
But that meant these point solutions were very brittle. They weren't the kind of flexible thing that you can iterate on your product at scale.
And the desire to build a data system, which would let basically small teams like we were when I started at Twitter, build products for SaaS, for consumers, for IT, and scale them from small to large without having to re-architect, go all the way from two people in a garage to a global company.
There's just no product that could offer that.
We had the RDBMS, we had some NoSQL stuff like Cassandra and Mongo coming on the scene, but nothing even attempted to solve this sort of generic utility computing operational data problem.
And we started Fauna a while after leaving Twitter.
We did consulting for a while first. And then when we started the project, me and some of my former team from Twitter, it was before serverless even had a name.
So we were always building a multi-tenant data API, but we kind of had to wait as we matured the software for the market to catch up to us.
So we're very grateful that partners like Cloudflare and others are here now building the serverless ecosystem because that's always what we wanted to happen.
Sure.
Yeah. And I know you mentioned MongoDB and Cassandra both kind of coming to popularity around that time.
Maybe contrast the technical decisions Fauna made from the start versus a Mongo or Cassandra in terms of availability, consistency, et cetera.
Yeah. I think the genesis of the NoSQL movement was this theory that you had to give up basically everything flexible about the RDBMS in order to get scale.
And that was true in practice, but not true in theory. It was a matter of implementation ultimately.
There's nothing that says, there's no rule or law of information science that says you can't have a system, which is effectively a hundred percent available that still offers you transactional consistency or that transactions can't span continents, for example.
But it was just too hard at the time.
So we ended up with a lot of systems, which either basically took the RDBMS model with primary, secondary replication and improved on the API, like Mongo, or they were designed for developers or we had systems which were designed for operators like Cassandra, which said like, who cares if the interface is flexible or even makes sense as long as you can get global scale, like you have to use it because nothing else can offer that.
But those days are long behind us now.
And we wanted to solve these problems in a way which was general and didn't have these trade-offs that these were still ultimately point solutions at the time had.
So in the course of developing fun and we adopted things like the Calvin transaction architecture that lets us deliver strongly consistent transactions at global scale and the multi-tenancy, the multi-tenant scheduler in the database kernel that lets you safely have this utility computing experience through an API and a web native security model, which means you can expose the database to untrusted clients without middleware in between.
And basically everything required to interact with operational data in a modern stack in a modern way and flexibly iterate on it.
That makes sense. Yeah. Kind of everything coming together into one sort of mission -critical multi-P database, not even a database, right?
I'm also curious why the name Fauna would cause you to pick that name.
So if Fauna first, we could get the .com. Second, we were always building a multi-tenant system.
Fauna was an API from the beginning. And we felt like the idea of an ecosystem represented both how multiple applications, multiple users, even multiple customers would interact with Fauna databases and also represented sort of the role we wanted to play within the data ecosystem and the development ecosystem overall, where you have a diversity of workloads, applications interacting, potentially sharing data, potentially isolating data, and sort of that diversity of the ecosystem gives you a lot of flexibility and power compared to things that are effectively point solutions that can deliver.
I guess it's kind of the antithesis of the polyglot persistence mindset where you're like, we need a single tool for each individual job and we have to spend all our time wiring them together.
That's not like an organic kind of architecture or a way to evolve a business application in an efficient way.
Like you have to plan up front and this is what you have and you spend all your time maintaining and integrating.
Got it. Yeah, I used to be at MongoDB and I prefer the name Fauna, I think.
It might maybe has better roots.
I'm not totally sure the backstory there, but wanted to get it. Yeah.
So I guess today the partnership we're announcing is between Fauna and workers and actually a way for workers to directly connect out to Fauna.
So we have examples for that and we have a long blog post we published along with a customer use case or two.
And so for people who aren't aware of what workers are, they're Cloudflare serverless offering, which lets you run code directly at our edge.
So you can run JavaScript code or TypeScript code or any language that can be transpiled to JavaScript and run that in all of Cloudflare's points of presence.
So Cloudflare has over 200 points of presence around the world.
When a request comes in from a client application, we actually just instantiate that code right in that end data center.
And what's great is workers are actually built on top of isolate technology.
So isolate technology is what powers V8, which is probably running in the browser you're using right now to watch this.
And what that means is we can offer incredibly distributed compute with almost zero millisecond start times, cold start times, and that immediately scales to millions of requests.
So having sort of a data API to communicate back with is a very natural fit, I think.
And when we were thinking about partners for building serverless applications, our mind sort of immediately went to Fauna as something that can handle that scale natively.
And so we haven't really directly mentioned this, but one thing that I think Fauna is doing a lot of is giving developers a new API for accessing legacy data that might have been trapped up in some relational system that just didn't scale and then gets replicated out into Fauna.
Another approach here is that some developers are considering using GraphQL and using that to an API interface to some existing relational system or just building on top of a GraphQL API to begin with.
I was wondering if you could give me a little bit more info about how that compares to using Fauna and how that compares to Fauna's query language, FQL, and sort of the trade-offs there.
Yeah, Fauna offers two query interfaces. It offers GraphQL, which is standards-based, but has some limitations I can talk about in a minute.
And it also offers FQL, which is a functional relational language that gives you the same kind of properties you'd expect from a SQL-based RDBMS in terms of the semantics, but also is type -safe, is secure, is more composable.
When you think about programming an API, accessing functions or stored procedures in the cloud, you need a more programmable interface or composable interface than something like SQL, where it's difficult to move beyond the single query or the single view.
GraphQL is interesting because GraphQL's original value proposition is integrating disparate data sources.
So there's sort of that data composition, we would have called it like an enterprise service bus or something in the past, or data virtualization.
At the same time, the other half of it, like the other side of the coin of its value proposition is decoupling sort of the application development lifecycle, the front-end code, from the way the data is presented from the backend, and particularly the backend microservices, so that you can have mobile and web developers iterate on the way data is consumed and presented without having to have various REST endpoints constantly modified by the backend team.
But what GraphQL doesn't do is give you really a mutation language, like it lets you specify the structure of your data, lets you specify the endpoints.
It doesn't let you coherently modify that data and force constraints, that kind of thing, because it's not a query language.
It handles the read side more than the write side.
It doesn't handle composability. So what we're doing with Fauna, like we're not trying to replace middleware in the GraphQL space that lets you compose multiple data sources together, but what we want to do offer is the native GraphQL interface.
So if all you need is what GraphQL provides, you can consume that directly, either from middleware or from the client, mobile or web.
And at the same time, when you exceed sort of the constraints of what GraphQL can do, then you get to use FQL instead, which is a full-featured relational query language, very composable, still accessible natively over the web, still secure, and so on.
And we see a lot of people succeeding, mixing and matching these strategies, either starting with GraphQL and then upgrading queries to FQL when they need more complex interaction patterns, or coding a lot of the business logic and stored procedures that they access directly from GraphQL endpoints.
So you have sort of a GraphQL interface in front of the FQL business logic, and that means you can use GraphQL standard clients and things like your React apps and so on.
So we see like, you know, GraphQL has a lot of value in its own right.
It's a simple way to approach basic data access. It's very composable and flexible for the read side of the queries, but then you need something like FQL, which is more complex and more sophisticated for the right side or the mutation side.
And we're doing a lot of work right now to help, you know, not converge, but bring these interaction patterns closer together in terms of syntax, semantics, and that kind of thing.
So there's no impedance mismatch when you switch between them.
We saw, you know, similar evolution with SQL where, you know, there's the SQL basics were laid down early, very analytics focused, and then various extensions were proposed.
Some got taken off the standards, some remained proprietary for specific domains.
What we're trying to avoid is basically the ORM phenomenon where the query language is so divorced and so archaic compared to the way modern applications are actually built that you have to have this completely, you know, independent translation layer that tries to like take the tabular form of SQL, for example, and make it object oriented.
And that worked like in the with things like Hibernate and ActiveRecord and Rails.
You know, that gap was relatively effectively closed, although it still continues to cause problems, but especially in a serverless world, like when the security model and the operational model and the consistency model and all that kind of thing are very important for delivering a serverless experience, you can't just, you know, slap some middleware on RDBMS and accomplish those goals.
Sure, sure. And I want to get on one piece there that you mentioned, you know, about the actual operational model being differently.
And you mentioned this a few times, which is that Fauna is safe to actually go out and access from a client.
Could you talk a little bit about that sort of changing the paradigm of applications and how apps are actually built on Fauna with those requests coming from the client?
Yes, if you think about Twitter, right, when we were building the Twitter API, you know, first, we weren't like racking a machine for each developer who wanted to query the Twitter API, you know, it was all multi-tenant over the API.
And second, you know, there was no secure perimeter, like people accessed the API over the web and had to be secured with, you know, what at the time were the, you know, the state of the art web native security models.
And we carried that mindset forward to Fauna thinking, you know, we want applications to be able to securely access their database outside a specific security zone.
And we also want to integrate with, you know, identity providers, Auth0 and others, you know, who conform to modern web standards and have that pervade, you know, not just the layout of the data and the access, but the business logic too, so that you can say, you know, you can create per user domains, you can have public data, you can have private data, you can let users sort of bootstrap themselves and create, you know, go through a signup flow from a completely untrusted client and have all that work.
So you get to the point where you can, you can mix and match your access patterns and you can have things like, you know, secure server-side code that runs in something like workers in a serverless compute environment that has more privileged access to the dataset or not, because maybe you don't want to bother, you know, managing your own security and you want to authorize on a per user basis, even from your server-side code.
And simultaneously you can access that same data with a more secure web native, you know, row level, user level security and identity model directly from the clients.
So you can do things like mix and match basically where your trusted business logic runs and where your untrusted business logic runs.
And this is a big change, you know, it's part of the serverless model overall, it's part of JAMstack, it's part of, you know, this move to what we call a client serverless architecture away from the three-tier model.
Like the three-tier model basically says like the web server is my security boundary, everything that happens behind that is like completely open.
It's up to the application developer to make sure that there are no holes and then we get, you know, SQL injections and all the usable problems with that model.
When you move to global distribution, when you move to clients, browser and mobile, which are, you know, worldwide have no particular predictable access patterns that starts to break down because there's no physical location to create a security perimeter that makes sense.
So you need direct access to the distributed data and that means you need a security model which will enable that.
Right, makes a lot of sense and I think that's also where Cloudflare Workers fits in, right, of extending that perimeter out, you know, a bit further, right?
So running code in 200 data centers versus having it in, you know, one data center and pushing that a bit closer to the user but maybe not fully all the way up to the client and that's where being able to make a request, you know, to Fauna from a worker could really fit in.
Right, yeah, and Fauna is globally distributed.
You don't have to think about, you know, this is my primary region and like these regions are, you know, they have replication lag, they're read-only, that kind of thing.
Like we present a, you know, essentially a ubiquitous data API that you can query from anywhere with low latency.
Sure, sure, that makes a lot of sense and that's, again, a good reason for the partnership, a good reason why Workers works well with Fauna is if we send you a request from Newark, New Jersey, you'll see the same latency as if we send you one from South Africa, for example.
Yeah, I want to quickly run through Cloudflare's two storage offerings and Workers KV and Durable Objects to kind of point out where they might differ and then we can chat a little bit about use cases.
So Cloudflare has two storage products that are accessible from a worker today.
Workers KV, which is sort of an eventually consistent globally distributed system, differs a bit from Fauna in that it's only key value access and it's eventually consistent.
And then separately, Durable Objects, which we just announced are going into an open beta.
Durable Objects give you strongly consistent storage with sort of a novel architecture, I think, but they don't come with any of the sort of operational excellence or querying that's built into Fauna by default, right?
So the security model is much different.
There's no query language to go across them. So it's more for the way you could layer Durable Objects into an application.
Maybe you go make some requests to Fauna and you know you're going to make that request.
The data infrequently changes and you want to cache that request to Cloudflare's edge.
Durable Objects could be a great way to do that. Essentially, a Durable Object lets you take your requests that are coming in to workers distributed across Cloudflare's entire network and then funnel those requests down to one specific object that handles requests for that given ID.
So sort of layering that in as a caching layer, as a coordination point over, you know, just not being able to do that in standard workers.
Yes, I guess from that, maybe you could chat us through a few customer use cases, what you're seeing at Fauna, what you're seeing customers build.
Yeah, I think to your description of Durable Objects, you know, I think one of the common themes between the two systems is the idea that you can run your business logic local to the data.
You don't have to have this, you know, high latency back and forth over the Internet.
You can either, you know, in Durable Objects case, you have the like sort of the per object, you know, the per object states which are widely distributed but only in one specific place.
In Fauna's case, you have the somewhat less widely distributed but universally accessible, you know, more relational layout of your data.
Yeah, exactly. I think Durable just makes the choice to kind of make that co -location really tight, almost.
It's like you get access to one or two rows, essentially, with your code, whereas Fauna gives you the whole database.
And also, yeah. Yeah. Cool.
Should we chat use cases? Yeah, I think, you know, we see, like I mentioned client serverless before, like I think the architectural pattern we're starting to see and some of it again with Jamstack too and, you know, some of these, you know, especially with the React ecosystem and some of these technologies that grew up adjacent to GraphQL, this idea of like empowering the front-end developer to build more sophisticated static applications.
Like we see this pattern where you have a lot of people with products which are built on three-tier architectures using either microservices or monoliths, you know, running in specific managed cloud sites.
They want to extend their infrastructure without making additional investment in those legacy systems, especially the legacy databases that become increasingly difficult to modify as they reach scale.
You know, if you're sort of dialing down your investment in this legacy architecture, well, you want something which is easy to compose.
And one of the easiest ways to compose new functionality in an application is to do it from the front -end.
So we start to see sort of this breath where you can have kind of a pure serverless application where you have, you know, untrusted code only runs in the client, trusted code only runs in a serverless compute layer like workers.
And then it talks to, you know, a serverless database like Fauna exclusively all the way through to an augmentation where you have existing product that, you know, has been augmented, especially for things like per-user augmentation, which maybe, you know, is more difficult to do in a legacy RDBMS that only lives in one site.
You know, augmented by adding on these patterns to the existing infrastructure.
And I think that's pretty typical, you know, as industries evolve.
Like, you know, we're still at the early innings of the serverless era.
I think we'll see more pure applications in the future.
But, you know, one of the things Fauna can let you do is, you know, basically mix and match kind of your edge-focused, user-focused workloads with that ubiquitous general purpose shared data set that you would expect from a typical RDBMS.
And that's the kind of thing we see from, for example, in the blog post we published.
That makes a lot of sense. It sounds pretty similar, honestly, to what we see with customers and workers, where because you can run a worker in front of your host, your route, you can start off by just changing a header, right?
You can start off by just rewriting some HTML that's coming back to the user.
But then slowly over time, you know, you can start to do authentication, authorization, access databases, and work your way up to building your whole application.
So, we certainly see people building their whole application on top of workers, but there's also sort of this layered in front and see where you go with it and kind of get the benefits over time.
I think, as you mentioned, really fits a lot of enterprise use cases.
Yeah, and I think, too, you see it with the vertically integrated APIs, too.
Like, it makes a lot more sense for your client to query, like, Stripe, for example, and then query workers to run some business logic and then query Fauna to store some data.
If that is all secure within the client's own kind of user space, like, why do you need to ship all that work to some persistent server that lives in only one place to query the same APIs you would query directly?
You know, that's what the three-tier architecture is, you know, centralize everything to work around, you know, especially the security problems, but also this consistency problems of not having a serverless database available.
Yeah, yeah, that makes a lot of sense. I think there is a place to occasionally stick a cache in the middle there, but that makes sense to me of, you know, pushing that work out to the client for, you know, you're requesting something once.
Why would you want to cache that? Why wouldn't you just do that request to the client if it's secure?
Yeah, I mean, I think with, like, durable objects, right, one of the best use cases is as a coordination point across a small number of users.
So, if you have shared data, which, you know, has well-defined boundaries, you can put it there, but, you know, potentially, what you want to make that data durable for analytics purposes, you know, BI, auditing, enforcing transactional constraints across, like, a, you know, a very large group of data, then you need something like Fauna as well.
Yeah, exactly, exactly.
Get it out to a layer where you can query it and analyze it, yeah.
All right, well, Evan, it was great chatting with you. Thank you for coming on today.
Yeah, thanks a lot. Great to be here. Yeah, really excited about the partnership and where Fauna and workers will go in the future and what all the developers watching will build.
So, yeah, thanks again. Yeah, and, yeah, I mean, Fauna is, to be clear, Fauna is free to try.
You don't need a credit card. You don't need anything at all.
So, you know, we've, we've, we have the tutorials for using Fauna with workers published now, I believe.
So, yeah, sign up and let us know how it goes.
Yeah, yeah, definitely. And, yeah, I think that blog post, I don't know if it's easily linked from here, but it's definitely on the Cloudflare blog right now about our database partnership.
So, go take a look, read through that. All right, thanks, Evan.
Thank you. Everybody should have access to a credit history that they can use to improve their, their situation.
Hi, guys.
I am Tiffany Fong. I'm head of growth marketing here at Kiva. Hi, I'm Anthony Voutas, and I am a senior engineer on the Kiva protocol team.
Great. Tiffany, what is Kiva?
And how does it work? And how does it help people who are unbanked? Micro lending was developed to give unbanked people across the world access to capital to help better their lives.
They have very limited or no access to traditional financial banking services.
And this is particularly the case in developing countries.
Kiva.org is a crowdfunding platform that allows people like you and me to lend as little as $25 to these entrepreneurs and small businesses around the world.
So anyone can lend money to people who are unbanked. How many people is that?
So there are 1.7 billion people considered unbanked by the financial system.
Anthony, what is Kiva protocol? And how does it work? Kiva protocol is a mechanism for providing credit history to people who are unbanked or underbanked in the developing world.
What Kiva protocol does is it enables a consistent identifier within a financial system so that the credit bureau can develop and produce complete credit reports for the citizens of that country.
That sounds pretty cutting edge.
You're allowing individuals who never before had the ability to access credit to develop a credit history.
Yes. A lot of our security models in the West are reliant on this idea that everybody has their own personal device.
That doesn't work in developing countries. In these environments, even if you're at a bank, you might not have a reliable Internet connection.
The devices in the bank are typically shared by multiple people.
They're probably even used for personal use.
And also on top of that, the devices themselves are probably on the cheaper side.
So all of this put together means that we're working with the bare minimum of resources in terms of technology, in terms of a reliable Internet.
What is Kiva's solution to these challenges? We want to intervene at every possible network hop that we can to make sure that the performance and reliability of our application is as in control as it possibly can be.
Now, it's not going to be in total control because we have that last hop on the network.
But with Cloudflare, we're able to really optimize the network hops that are between our services and the local ISPs in the countries that we're serving.
What do you hope to achieve with Kiva?
Ultimately, I think our collective goal is to allow anyone in the world to have access to the capital they need to improve their lives and to achieve their dreams.
If people are in poverty and we give them a way to improve their communities, the lives of the people around them, to become more mobile and contribute to making their world a better place, I think that's definitely a good thing.