Workers AI -- what it is, why it matters, first call

OpenCode shortcut

Paste this prompt into the OpenCode TUI to let your AI agent walk you through this lesson:

Step 0: Navigate to the lesson directory

Open your terminal and navigate to the lesson-08-workers-ai directory:

cd ..\lesson-08-workers-ai

This directory contains a fresh copy of the app pre-configured for the Workers AI lesson. The STATE variable is already set to state-4-workers-ai and the AI binding is already configured in wrangler.jsonc.

What is Workers AI?

Workers AI is Cloudflare’s serverless GPU inference platform. It runs models directly on Cloudflare’s global network — no external API keys, no third-party endpoints, no data leaving the network. Your Worker accesses it through a binding (env.AI), just like D1 and Durable Objects.

Workers AI provides access to over 30 models across multiple categories:

Large Language Models — text generation, analysis, summarization
Embedding models — for semantic search and similarity
Image models — generation, classification, captioning
Speech models — transcription, text-to-speech

For this workshop, we’re using @cf/meta/llama-3.3-70b-instruct-fp8-fast — a quantized 70B-parameter LLM optimized for fast inference. It’s capable enough for structured security analysis and fast enough for interactive use.

Why Workers AI for security automation?

Security incident data is some of the most sensitive data in an organization: endpoint telemetry, user credentials, network traffic patterns, threat indicators. Sending that to an external LLM provider creates a data governance problem.

Workers AI solves this because:

Data stays on Cloudflare’s network. Your incident telemetry goes from your Worker to the GPU — it never leaves the network boundary. No third-party data processing agreements needed.
Sub-second inference. The model runs on GPUs co-located with Cloudflare’s edge. Latency is measured in hundreds of milliseconds, not seconds.
No cold starts for the binding. The env.AI binding is always ready. The model infrastructure is managed by Cloudflare — you don’t provision GPUs or manage model deployments.
Per-request pricing. You pay for what you use. No reserved capacity, no minimum spend.

The AI binding pattern

The core API is one function call:

const result = await env.AI.run(
  "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
  {
    messages: [
      { role: "system", content: "You are a security analyst..." },
      { role: "user", content: "Analyze this incident: ..." },
    ],
  }
);

That’s it. The binding handles:

Authentication — no API keys to manage or rotate
Routing — the request goes to the nearest available GPU
Model lifecycle — Cloudflare manages model loading, scaling, and updates

The response contains the model’s generated text, which your Worker can parse, store, or stream to the client.

Step 1: Verify the AI binding

Open lesson-08-workers-ai\wrangler.jsonc and verify the AI binding block is present:

"ai": {
  "binding": "AI"
}

This makes env.AI available in your Worker code. The binding name AI is a convention — the runtime knows how to route calls made through this binding to Cloudflare’s inference infrastructure.

The STATE variable should already be set to state-4-workers-ai, which activates the investigate handler in src/server.ts. When STATE is state-4-workers-ai, the app’s “Investigate” button makes a POST request to /api/investigate/:incidentId, which loads the incident from D1, constructs a prompt with the incident’s telemetry data, and calls env.AI.run() to generate an analysis.

Step 2: Redeploy

Deploy the updated configuration:

npm run deploy

You should see the deployment succeed with no errors. Wrangler recognizes the new AI binding and configures it automatically.

Step 3: Verify Workers AI is responding

Open your app in the browser at your *.workers.dev URL.
Click on any incident in the queue — for example, the Malware + C2 incident.
Click the “Investigate” button in the incident detail view.
Within a few seconds, you should see a model-generated analysis appear in the triage panel.

The analysis is coming from Workers AI running on Cloudflare’s GPUs. The model receives the incident’s telemetry data as context and returns a structured assessment — what happened, how severe it is, and what to investigate next.

What comes next

Right now, you’re making a single model call that tries to analyze every dimension of the incident at once. In lesson 09, you’ll look at exactly how that call is structured — the prompt template, the incident data formatting, and how the response renders in the UI. Then in lessons 10–12, you’ll decompose this single call into multiple specialized sub-agents that analyze different dimensions in parallel.

Workers AI -- what it is, why it matters, first call

Step 0: Navigate to the lesson directory

What is Workers AI?

Why Workers AI for security automation?

The AI binding pattern

Step 1: Verify the AI binding

Step 2: Redeploy

Step 3: Verify Workers AI is responding

What comes next

Key takeaways

Workers AI is inference through a binding

Security data stays on Cloudflare's platform

The model call is small, but the boundary is important