🎂 What Launched Today - Wednesday, September 27
Presented by: Rita Kozlov, Celso Martinho
Originally aired on September 27, 2023 @ 10:00 PM - 10:30 PM EDT
Welcome to Cloudflare Birthday Week 2023!
2023 marks Cloudflare’s 13th birthday! Each day this week we will announce new products and host fascinating discussions with guests including product experts, customers, and industry peers.
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
- What AI companies are building with Cloudflare
- Vectorize: a vector database for shipping AI-powered applications to production, fast
- Partnering with Hugging Face to make deploying AI easier and more affordable than ever 🤗
- Announcing AI Gateway: making AI applications more observable, reliable, and scalable
- Writing poems using LLama 2 on Workers AI
- Workers AI: serverless GPU-powered inference on Cloudflare’s global network
- You can now use WebGPU in Cloudflare Workers
- The best place on Region: Earth for inference
Visit the Birthday Week Hub for every announcement and CFTV episode — check back all week for more!
English
Birthday Week
Transcript (Beta)
It's been an exciting day. We made many, many, many announcements. So I'm excited to dive into all of them with Celso.
I guess to get started, let's introduce ourselves.
So I'm Rita. I'm a Senior Director of Product here at Cloudflare, working on our developer platform and recently on many of our AI initiatives.
Celso, you want to tell us a little bit about yourself and what you work on?
Sure. My name is Celso.
I'm an engineering director based in Lisbon. I run a couple of teams, one of them being Work as AI, which has been very exciting for us.
And fun fact, both Celso and I are recording live from Lisbon today.
Exactly. You're just in the next booth.
That's right. Cool. So a bunch of announcements today, I will give a quick recap of them.
So first of all, we announced Work as AI, which is our new inference product that's running on Cloudflare's network, that's running actually on GPUs.
So if you've been following along, we announced Constellation, another Celso project a few months ago during developer week.
And it was our first entry into running models for running AI on Cloudflare's network.
And the response that we received was really, really incredible.
But with that, we also received a lot of feedback specifically around developers wanting to run, obviously, generative AI on Cloudflare.
And so today with Work as AI, we're making it really, really easy to do that.
Celso, you want to tell us a little bit more about it?
Sure. So as you said, Constellation for us was a big learning experience.
We were just learning about how to deploy machine learning models globally a few months ago.
And the goal of Constellation really was to put a product in the hands of our customers, listen to them, learning from them, and in true Cloudflare fashion, iterate and make a better product later, which is what we're doing today.
So Work as AI, in a way, it's the result of a couple of months of learning how to do proper edge machine learning for our customers.
And the big, I would say that the two main most interesting features we have with Work as AI today is number one, we're finally running AI using GPUs globally in our network.
We have a growing by the day fleet of GPU nodes that we're adding to our global network.
And that obviously improves performance throughput and capabilities by a large number.
And the second most also very interesting capability we have today is to run much larger models than we could do with Constellation.
So we are introducing a catalog of six pipelines that we support.
And those pipelines support models like text translation, text classification, text generation.
For those who are not familiar with these terms, we're talking about things similar to chat GPT and Lemma.
We're actually using the Lemma model behind the scenes. We're also doing image classification, speech recognition.
And all of those models are managed by Cloudflare.
They run on top of very powerful GPUs. And so we're now able to scale these to customers that really need these kinds of models for their applications.
The other interesting thing I think we're doing now with Work as AI is a very simplified API.
So when we launched Constellation, we were thinking about data scientists and people who were somewhat familiar with how interfacing with the machine learning model works.
And so we did an API that required things like tensors and stuff like that.
But again, from listening, we realized that what most people are looking for is to start using the models and just solve their problems and do that as quickly as possible.
So we still have the option to manually build tensors and interface with the machine learning models at a lower level.
But now we have a much more simplified API that in practice allows anyone to start using text generation with five lines of code.
And I'm not exaggerating. I'm actually talking about five lines of code.
So this has been a big team effort. I'm honestly very proud of the work that we've been able to do over the last few weeks.
And we're truly excited about seeing what customers will be doing with Work as AI.
I completely agree.
It's been insane seeing this come together at every layer of the stack, right?
All the way from literally infrastructure people jet-setting around the globe to deploy GPUs that are available in seven cities today.
But also I think that the other thing that we learned, so there's the infrastructure part and enabling larger models, but the other area where we learned a lot I think is in the developer experience.
And that's something that I'm really excited about with the release of Work as AI today.
So as Celso said, it's as easy as five lines of code to get your own little AI application running.
And so you can connect to Work as AI directly from a worker, which I think is the best way to do it because that's how I like to build my applications or even better yet from pages and pages functions, right?
So you can actually have a single repo that powers a whole front end, whether you're building another chat GPT type thing, or if you're building something.
So one of the blog posts that went out today was actually our CTO, John Graham-Cumming generating sonnets using our new model.
So you can create a little sonnet generator if you so desire.
And so I think that those integrations are really cool, but at the same time, we want to meet developers where they are.
And so if you are using an existing machine learning or AI API right today, you can pretty easily sub it out for workers AI using our REST API, which also can work with many SDKs.
So the developer experience of that, I think has been really exciting and you can develop locally with it.
You can select all of the models.
We encourage everyone to check out the UI and the CLI experience. We've put a lot of thought into it, and I think it's really, really cool.
Absolutely. I think a good place to start would be the ai .qualifier.com page.
And from there, you can find links to the developer documentation.
I think we also did a good job in the dashboard, explaining what you can do with workers AI.
There are templates ready to use and deploy in seconds, literally.
And as for the REST API, if you go to api.qualifier.com, there's an endpoint there that you can use to do inference.
It uses the same kind of inputs and outputs as the workers SDK. Really easy to use, and you can just go there and see how to do it.
So I think that's a good jumping point into our next announcement today.
So if you are trying to your own chat GPT equivalent, let's say you have an e -commerce store, it has a bunch of products, and you want to be able to answer questions like, I'm shopping for a loved one, and they really love the color pink, and I would like to find this, this, and this item.
You need some way for AI to be aware of your product catalog, right?
And so that's where the vector database comes in.
Which we also announced today, Vectorize. And so you can deploy that in a way that integrates with workers AI really seamlessly.
Or again, if you're using any other products out there, whether that's OpenAI's GPT APIs, right?
Or if that's Cohere or Anthropic, you can use Vectorize with any of those.
Yeah, that's true. And together with the Vector database, one of the models that is also related to things like this is the text embeddings that we also support in workers AI.
So text embeddings allows you to take a text, any text as an input, and just preserve the signal of that text without actually showing the text, the content.
And you can use that signal with a number of other machine learning models on a pipeline.
So these two things are tightly related, and that is also one of the use cases that we wanted to enable with today's announcements.
That's exactly right. And for both of these, actually, one thing that I find pretty interesting is the pricing.
So over the past few weeks, I've been doing a tour, talking to every customer possible about AI, which is not hard, because everyone is thinking about their AI strategy right now.
And so everyone's happy to talk about it. And one thing that kept coming up over and over was every single customer mentioned this, right?
Oh, AI is so expensive. Or it's really hard to understand even what my AI bills are going towards and scale it up in a way that I feel like how much I'm spending on AI is actually aligned with the customer value that I'm getting out of it.
And so we thought really hard about the pricing for both of these products. For Vector databases specifically, we heard a lot about the pine contacts and people being unhappy with incumbents.
And so we wanted Vectorize to be as affordable as possible.
So the details about the pricing are in the blog. And then for workers' AI, it was also a really fun and actually interesting challenge coming up with the pricing.
And so we have what I think is a really fun unit of measurement called neurons, because obviously artificial intelligence, which measures or quantifies machine learning operations that go towards executing a model.
And we wanted to provide a model that really truly scaled down to zero.
So what we've seen with a lot of products out there, even ones that say that they're serverless, is that in one way or another, they still charge you for what is really infrastructure work.
So scaling things up and down. And our goal is to really charge only for what you use.
Exactly. I'm looking forward for us to implement neurons.
And I totally agree. I think what most people are doing out there is charging either for time or you just rent a GPU for you, which is really not what Cloudflare is about.
If you look at things like the workers' platform and how we charge it.
And I think the way we're trying to do this, obviously there's still work in progress, but I think we settled on an approach is to really go low level and charge for the actual compute that's going on inside the GPU.
Number of machine learning operations that your model is actually using when you do the inference tasks.
So really excited about introducing that. I think that will make Cloudflare very competitive in terms of pricing.
Yes, I completely agree. And I think that it's one of those things where developer experience and price go hand in hand, right?
Because not only do you not have to, I personally would not want to be setting up my own VMs with GPUs.
That is not my strength at all. I would much rather be writing code.
So you don't have to do any of that. But also I would imagine that the number of GPUs out there that get set up for these things, from what we looked at, the utilization is really, really low.
So you're actually paying for 24-7 access to something that really has ups and downs throughout the day as customers actually use it and then eventually go to bed.
Exactly. And we're in a position because of our network architecture and the way we design products on top of our network to take advantage of spare cycles and unused capacity.
And so I think that that can be a true advantage to customers in the end when we talk about pricing.
Which also goes along with our other product announcement today, our AI Gateway.
So again, really heard the feedback from customers about how do we make the cost and visibility into where they're going towards more accessible to more people.
And so with our AI Gateway today, we will be providing caching, rate limiting, and the observability part I think is kind of actually the most exciting part of it, even though there's not as direct a benefit to the users.
But I think it provides really, really helpful dashboards that help you see both how much traffic are you sending to each of the providers?
How much are you spending?
What do these queries look like? You can get a log for all of the things that are being asked.
I mean, I find it, we have our workers documentation, for example, right?
And so it's really interesting for me to know, okay, what are the things that people are asking Cursor?
What are the things that people are asking ChadGPT about workers?
So that so we can go ahead and start including those in the docs and way better responses.
Yeah, I agree. I mean, with the advent of things like text generation, LLMs, and cloud providers of these kinds of services, what most developers have been talking about is how we can cut costs, we can get more observability on the things that we or our customers are doing on top of our application that, that in turn, uses those endpoints and those APIs.
And obviously, we've seen some open source projects showing up.
But now we're providing that completely integrated into the Cloudflare experience and setting it up is as easy as just a few clicks.
So I think, I think a lot of people that are already using these providers and these services, and want to get instant caching in instance analytics and observability of how people are using those APIs, they they can set it up in in minutes.
And I think our customers are going to like that. I couldn't agree more.
I mean, every customer that I talked to about AI Gateway, I couldn't even finish my sentence.
They were like, Yes, sign me up. I want it right now.
And I think that at some point, we will also figure out ways to integrate the AI Gateway with workers AI itself.
So there's some potential there as well.
Yeah, I completely agree. And I think there's actually a lot. So there's what it covers today, right.
But I think there's a lot of possibilities for what it could offer as that middle layer between eyeballs and your AI deployment.
So things like comparing models, right, whether it's on latency or accuracy, I think one of the things that again, from talking to customers, people are really struggling with is understanding what is the right model for me, especially at different price points, right?
Or maybe you want to use two different models, one in staging and one in production, because you don't want to use up all of your tokens just working on staging.
And so I think that there's a lot of potential there. And then on the other side to for DLP solutions.
So if, you know, to help it departments manage things like shadow, shadow AI deployments or shadow it AI deployments, where, you know, you can write up all of the wiki docs in the world about this is our company policy about using tragedy PT, but it's a helpful tool and people are going to go and use it.
And so understanding, okay, we, you know, this is what someone was asking on there, or this is information that's, you know, maybe leaking out, I think something that a lot of people are worried about.
And so having mitigations in place for that, I could definitely see a gateway becoming a vehicle for that.
And what's interesting about that is that we've been doing that for many years now for HTTP.
And we know what we're doing, we've actually built the AI gateway on top of those building blocks and that knowledge.
So it was impressive that we were able to build a gateway product in just, can I say a few weeks?
So yeah, I think it shows the power of Cloudflare now adjusted to the AI case.
 Yeah, when we started shaping up what it was, I remember everyone having an aha moment that was like, wait, so we're building Cloudflare for AI, right?
Like all of the things that we ended up building into Cloudflare at some point, just make it AI specific.
But the building blocks are there. I mean, we do know how to do cache.
We do know how to detect security issues or shadow IT. We do know how to store credentials securely.
We do know how to provide analytics. So it's all there.
Exactly.
Okay. One more announcement to cover. WebGPU. So we released this in WorkerD today.
And this is one of the things where I just know that I'm going to check Twitter in like a few hours and someone will have built something.
And my mind is going to be completely blown because I never would have guessed that that was going to be something that it was possible to build on workers or using GPUs, right?
So WebGPU, basically, pretty self-descriptive. It gives workers access to your local GPU machine.
And I think that AI is the big use case for it. But talking to folks in the industry, I could also see it becoming a way to do things beyond AI, right?
Like rendering that happens somewhere else. Yeah, would love your thoughts and takes on it also.
Yeah. So first of all, a WebGPU is a new standard.
It's just been deployed in browsers like Chrome and Firefox. I'm not sure if Safari has done that already.
I think so. And WebGPU, for those who don't know, is the SQL to WebGL, which was the first API you could use in a browser to get accelerated graphics using GPUs.
But WebGL had a number of problems, complexity, portability, and you could only do graphics with it.
But what happened over the last years is that the industry and developers as a whole discovered that you can actually use GPUs for more than graphics.
You can use GPUs to do general compute.
And because in the end, GPUs are just doing multiplications and mathematical operations.
And you can take advantage of that for some other compute tasks, just not graphics.
So basically, the web platform stayed behind in terms of allowing you to have that level of control to a GPU.
And WebGPU solves that.
So WebGPU is a standard that is the result of companies like Apple, Microsoft, Google, and others working together to come up with the next generation APIs for GPU compute inside the browser.
So once we saw that, we said to ourselves, we need to support WebGPU in the workers because now you can use GPUs to do compute.
And the work is about compute. So we've been working over the last months to make sure that's possible.
That happens. We're launching that today for now in local development only.
So all you need to do is to download Wrangler. It will also download the latest version of WorkerD, which is kind of the same runtime we use in the production environment of Cloudflare Workers.
And if you download the latest version, you'll get WebGPU APIs starting today.
And you can start coding WebGPU code.
So we put out a demo of that. As you were saying, Rita, it is a machine learning demonstration on how you can use your GPU to do inference.
But you can do other things with it.
You can do video encoding. You can do cryptography.
To be honest, I don't know what you can do, but I'm like you. I'm pretty sure that people will come up with ideas.
This is an emerging thing right now in the community.
We wanted to be there since the beginning. Cloudflare is known to support emerging standards, especially in the workers.
So it's there. People can start playing with WebGPU today.
We're very proud. I think the work we'll be having over the next weeks is making sure that we can merge everything to the production environment.
And so you can start deploying your code in our network after that.
But for now, you can start building the applications locally. We won't change the APIs.
And you can use compute shaders to take advantage of the WebGPU APIs in your GPU.
 I every I feel like so our co-founder Michelle Zatlin always loves to say we're just getting started.
I feel like that's one way to wrap up blog posts. The only other way to wrap up blog posts when you're doing developer things is to sum them up with we can't wait to see what you build, which always feels so cheesy, but it's so true.
And I generally feel that way about a lot of today's announcements, but especially I feel like WebGPU is going to something about it feels like it's just going to surprise me because it's very low level.
So we don't know what people are going to do with it, but we're excited to see what's going to happen.
Yeah, exactly. Exactly.
We only have a few minutes left alongside our own announcements. We are product announcements.
We also announced several partnerships. And so, you know, we can't do these things on our own.
And I feel very lucky to have gotten to partner with some of the top AI companies in the world.
So our GPUs are NVIDIA GPUs. And so it, you know, our hardware team is so good.
And we are able to easily secure GPUs to have to have them in over 100 cities by the end of this year, which completely blows my mind.
We talked about running Llama, which is an open source model developed by Meta.
So that's available through our model catalog, and you can check it out and start playing with it directly on ai.Cloudflare.com, which is a really cool playground for it.
I know that I've had a lot of fun generating tweets and things like that with it.
We are we're partnering with Hugging Face.
So over the next few months, all of these models are going to end more are going to become available through so Hugging Face optimized models will become available in our dashboard, you'll be able to deploy them directly from Hugging Faces UI.
And we'll be powering inference endpoints for being able to bring your own models as well through Hugging Face.
So they've been a really fantastic team to work with.
And I can't wait to kick off the work with them. We mentioned Constellation earlier, which from the very early days of it, we started using Onyx for it, which is a super powerful technology that came from Microsoft.
So it's been really cool to collaborate with them on that.
And similarly, we're really excited to start working with Databricks on deploying MLflow in more places as well.
So I just rattled all of these off due to the time constraint, but I want to leave a minute.
So I'm curious what you're most excited about out of today's announcements.
It's a difficult question. I'll ask you a better question. As an engineer, what's the first thing you're going to go build the second that you have some time?
That's not just, you know, helping. Oh, I'm definitely going to build a few apps using the models that we support today.
Maybe I'll do some fun stuff with Arduinos and RetroComputers.
Why not? I'm really excited about WebGPU because it's so low level.
And I think people will surprise us with ideas that we haven't thought about.
And HuggingFace is amazing. I mean, HuggingFace is the GitHub of data scientists.
And just, you know, having a partnership with them is really exciting.
Definitely. Cool. Well, I hope everyone that tuned in goes and reads our blog posts as well, since there is so much information and details in there, as well as in our documentation.
So thank you, everyone. And happy birthday to Cloudflare today is our official birthday.
So hence the really, really big day. Thanks, everyone.
Show us your stuff. Yes, please show us what you build. We love to see it.