Visit the AI Week Hub for every announcement and CFTV episode — check back all week for more!
English
AI Week
Transcript
Hi, everyone. Welcome to Cloudflare TV. My name is Radwar Adwan. I am a product manager under the application security umbrella.
And today I'm super excited to be sharing with you an announcement in our AI week.
It's a brand new capability that we are adding to Firewall for AI, and it's about unsafe content.
moderation.
Before I walk you through this capability, let me set the scene first. Over the past couple of years, we have seen AI applications from chatbots, AI assistants, agents.
Usage is happening everywhere and it's exploding in adoption. Businesses are trying to embed LLMs everywhere into the customer experience.
But of course, with this innovation comes a new security challenge, which is AI introduced a brand new attack surface.
So LLMs are powerful, however, they are super vulnerable.
So a single malicious prompt can do real damage.
It can exfiltrate data.
It can be used to poison the model and even degrade the future outputs of it.
It can inject toxic or harmful content directly into your customer -facing interactions.
So, of course, for...
this is super scary.
You have spent years and years trying to secure your apps, your APIs, your users, and now AI is a whole new front that you need to defend.
So here's the kicker, without proper guardrails, even a well -trained model can be turned against your business.
That's why at Cloudflare and here today as part of AI Week, we are excited to announce that unsafe Content moderation is now available in Cloudflare firewall for AI.
So what does that mean?
It means that you can detect and block harmful prompts in real time before they even reach your model.
So with just a few clicks, you get unified detections across all prompts, across all models.
If you have third -party models or something that you have built in -house, all of that is going to be protected behind Cloudflare.
to have analytics on the categories and on the kinds of detections that we have found and the ability of course to enforce policies to block or stop any kind of abuse that you are facing on your LNMs.
So why does that matter? It matters because moderation isn't just about blocking bad words, it's more about protecting your users and meeting compliance requirement.
reserving the trust in the brand name so unsafe prompts of course can create multiple risks like spreading misinformation or amplifying bias or maybe causing some offensive content specified to specific kind of people so this is also like without forgetting that this can cause model poisoning which is a repeated malicious prompt can degrade your model over time and that's one of the OWASP top 10 for LLMs which we are trying to secure LLM against.
So with this change firewall for AI will have the ability to detect three of these LLM big risks.
We started with BII data leakage and now we have unsafe topic detection.
So what's coming up next is that in upcoming days actually firewall for ai will expand to include detection of prompt injection and jailbreak attempts in addition to more abuse controls like token based rate limiting so here's the takeaway if you are already using firewall for ai unsafe content moderation is available starting today you can start using it right away without taking any action if you are not yet on boarded you can reach out to your account team or the sales team and they will help you get on boarded.
Finally, thank you so much for tuning in and stay tuned for more updates throughout AI week.