Introducing new models and tools for Workers AI and AI Gateway
Presented by: Taylor Smith, Michelle Chen
Originally aired on August 27 @ 12:00 PM - 12:30 PM EDT
Welcome to Cloudflare AI Week 2025!
There's barely a company or a startup not focused on AI right now. Companies' entire strategies are shifting because of this incredible technology.
From August 25 to 29, Cloudflare is hosting AI Week, dedicated to empowering every organization to innovate with AI without compromising security.
Tune in all week for more news, announcements, and thought-provoking discussions!
Read the blog posts:
- State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI
- AI Gateway now gives you access to your favorite AI models, dynamic routing and more — through just one endpoint
Visit the AI Week Hub for every announcement and CFTV episode — check back all week for more!
English
AI Week
Transcript
Hello, everybody. Welcome to AI Week Wednesday. We've been hearing a lot about from our SASE and Zero Trust teams this week, but I'm super excited to change gears today and bring in some latest developments from our developer platform.
My name is Taylor Smith. I'm a product manager from our media platform team.
And here with me is Michelle Chen. Michelle, you want to introduce yourself?
Sure. Hi, everyone. My name is Michelle Chen. I lead product. here for our Worker's AI product, also involved with AI Gateway, a little bit of our different AI initiatives going around, do a little bit of everything, have had some previous roles here at Cloudflare too.
So really excited to be here today, talk a little bit more about AI Week and how we're helping developers.
Great.
Thanks for joining. So I heard a lot about AI Gateway already this week, but for anybody who might be new, could you give like a little high level pitch?
What is AI Gateway? Yeah, for sure.
AI Gateway is a product that's near... and dear to my heart because it's actually what kind of got me started working on a lot of these AI products.
We launched it first almost two years ago at birthday week and we originally thought it was going to be you know a proxy that connects that sits between your application and allows you to connect to different providers like OpenAI, Anthropic, Grok etc.
It's kind of very similar to what we do at Cloudflare which is kind of sit in the middle and from there we can do things like caching and observability and analytics.
That's a little bit more of how we got started. We've got a lot of great usage on AI Gateway.
It's really, really popular with developers that are building AI applications because then they can monitor all their costs and see like their token usage, they can rate limit, they can save a lot of costs with caching, etc.
So you've seen it been really popular so far, but this week and today we're really excited to announce a lot more new improvements for AI Gateway because I feel like we could do a lot more and now we're doing a lot more and excited to share to the world what we've been working on with this.
Start with your favorite. What's new today?
So when we first started, AI gateway, it was kind of lightweight and people brought their own keys.
So when you send a request to AI gateway, what actually how it works is you swap out the base URL to an AI gateway URL and then you send like an open AI token and then you can access and we prompt you the request for you and ultimately it ends up at open AI.
But when you're using these AI products and providers, a lot of times there's a lot of different models and a lot of different providers you want to try out.
And it's kind of hard to manage going on to each provider dashboard. and getting a token for it, like an API key for it.
And, you know, you have different rate limits with different providers, you have your models, et cetera, et cetera.
One of the big improvements we're announcing today is that we have this sort of keyless product now where you don't actually have to go on to OpenAI or Anthropic and get your own key.
We can actually just deliver it for you via AI Gateway.
So on AI Gateway, what that means is you'll be able to load credits into your account on Cloudflare, and you can connect to any provider.
and we'll charge you as one sort of unified Cloudflare bill.
That way you don't actually have to go on and log on and manage your keys, etc.
So that's the first thing we're announcing today that I think is a really big quality of life developer experience upgrade for everyone, which is that you are just able to go on and log on and start testing different models of different providers without having to get your own keys for there.
But in extension to that, we're also supporting still that bring your own key method that we've always supported from day one.
But we made it a little bit better.
And so in our dashboard now, you're able to paste your key in and it's stored securely via our secret store product.
That way you don't have to send it in the request headers or anything like that like you would normally do in a curl.
We kind of keep all your keys securely stored on the AI Gateway Cloudflare side for you and you don't have to manage that.
Nice little improvement in quality of life update here that you are able to just go on and improve your developer experience.
So those are the first two related to keys.
We also have a brand new feature that I'm super excited about.
It's called dynamic routing.
And so what we're seeing with different providers is that you, a lot of people want to be able to manage like, hey, if this customer is a free kind of customer or versus like an enterprise customer, they want to be able to split up what models the free customers go to and the enterprise customers go to.
So we've actually built this whole flow that kind of looks like a flow chart.
It's a drag and drop, like UX driven.
sort of flow where you can say hey like start here and then it's you know if customer type is free or if customer type is end then it might go to this model and then the enterprise customers might go to this model and so i'm using free and enterprise as like an example you can actually use any kind of metadata for your customer in there but it's a really great way to be a lot more granular with your traffic and control you know what requests are going where and to what model so we're calling that dynamic routes it comes with a really great like ux builder flow kind of like a, if this, then that sort of flow.
So I talked a little bit more, introduced unified billing, I introduced the bring your own keys with secret store.
And I talked a little bit more about dynamic routes as well.
Yeah, I've been really excited about that, that interface builder, because it's it's but I makes it really clear, like, what's going on, what the possibilities are.
And it's super neat that you can sort of pull, you know, context from from your requests or from from your own customers, you know, into that decision making.
So cool, big week for AI, for AI Gateway.
Let's change. gears over to Worker's AI.
I hear that you've got some exciting new models and partnerships. Yeah.
So on Worker's AI, you might know that, you know, we've started off with doing a lot of open source models.
We have, you know, all the best state -of -art open models that are available. We host them on our own infrastructure and allow people, our customers, to access those models through a Worker's AI API.
And what we've kind of seen is like, sometimes you want models that are more capable and more powerful or that just aren't open.
and so we're really excited to introduce this concept of partner models on our catalog and so these are models from these AI labs that have trained their own models and we're hosting it on our own infrastructure still so you still get them get that edge network latency bonus that you get from being hosted on workers AI but we've partnered with these really great partners Leonardo who does image generation and Deepgram that does voice and audio models and are bringing that to be available on our platform so just like how you would use any sort of workers a guy open model um there's going to be a model id like app cf slash leonardo slash phoenix 1 .0 and you can call that and access these state -of -the -art typically closed source models now available on workers ai and i have played around with these models before and they're so great at creating like really realistic beautiful images with leonardo's models and then we have really great text -to -speech to text models with Deepgram as well these audio models are going to enable some of these new use cases for us that we think are really well suited for workers AI's infrastructure model which is like low latency voice applications and I'm gonna spoil it a little bit but we have a upcoming announcement later in the week talking a little bit more about real-time models and kind of what we're doing in the voice space so the Deepgram models are just like a great precursor to help enable all of that and so really excited to bring all these partner models now onto workers.ai.
It's super awesome. And I'm really excited about the Deep Graham stuff, especially coming from the media platform side.
I think there's some really neat things that that'll help us do as well.
So really excited about that. Does that change our thinking around like, might there be more proprietary models or other partnerships in the future that we could expect to see as part of this?
Is this like a template? We partnered with, you know, people and teams that we've been working with for a while, but we're super excited to expand this as well.
This is really just the beginning. and we're trying to figure out, you know, what models are suited for our infrastructure and our patterns, what models customers want to see.
And so we're really excited to mark this as the beginning, but it'll continue to evolve and hopefully we'll get more partner models as well.
So if I were a Workers.ai customer, is there anything that I need to do or can I just jump straight to like building with those, the worker, the IDs that you put in DevDocs?
So it's super easy.
It's not gated behind anything.
It's just like...
Go on the dev docs, check out what model you want to use, copy the model ID over, take a look at the schema because it might be a little bit different.
So it'll usually say like the size of the model or the image you want or like the number of steps you want and the prompt and things like that.
And then you should be good to go.
You'll start generating really cool images or creating really cool voice snippets or doing a lot of transcription.
So very, very easy to get started there.
Oh, there's one more thing, though.
We are introducing this is also a spoiler.
for a later announcement, but we're introducing like WebSocket support for these, for the audio models.
And so being able to connect and hold these like bi -directional connections to the inference server allows you to do actual like real time, low latency audio inference.
So we're really excited to be able to support that as well.
That might be a change because that's a new thing we're introducing to our platform, but would love people to, you know, check it out and try it and see how it works for them.
Yeah, I'm super excited to see what, what folks. built with that, and also what we will be releasing later this week, built with that.
Well, thank you so much.
And then the other two things that I do want to point out that we have coming out today, we've got two blog posts that describe key parts of our workers AI ecosystem, Infire and Omni, which are internal services, but it tells the story of how we do inference and large model catalog management at the edge, which is the magic that makes workers AI work the way it does and provide, you know, that low latency inference value to...
developers um thank you so much michelle for for your work on those posts too and i'm very excited for all the things that are coming out today very excited too thank you for having me on taylor sounds good thanks so much and and thank you everybody for joining and uh happy a week