SIRT Triage Agent Workshop
~10 min

Workers AI -- what it is, why it matters, first call

Learn what Workers AI provides, why it matters for security automation, and make your first AI.run() call.

Steps0 / 4
~10 min
  1. Uncomment the AI binding in wrangler.jsonc

    Find the `ai` block and uncomment it so the Worker has access to the AI binding.

  2. Change STATE to state-4-workers-ai

    Update the STATE var from `state-3-do` to `state-4-workers-ai`.

  3. Redeploy the app

    Run `npx wrangler deploy` — no errors.

  4. Verify Workers AI is responding

    Open an incident in your app and click 'Investigate'. You should see a model-generated analysis appear.

What is Workers AI?

Workers AI is Cloudflare’s serverless GPU inference platform. It runs models directly on Cloudflare’s global network — no external API keys, no third-party endpoints, no data leaving the network. Your Worker accesses it through a binding (env.AI), just like D1 and Durable Objects.

Workers AI provides access to over 30 models across multiple categories:

  • Large Language Models — text generation, analysis, summarization
  • Embedding models — for semantic search and similarity
  • Image models — generation, classification, captioning
  • Speech models — transcription, text-to-speech

For this workshop, we’re using @cf/meta/llama-3.3-70b-instruct-fp8-fast — a quantized 70B-parameter LLM optimized for fast inference. It’s capable enough for structured security analysis and fast enough for interactive use.

Why Workers AI for security automation?

Security incident data is some of the most sensitive data in an organization: endpoint telemetry, user credentials, network traffic patterns, threat indicators. Sending that to an external LLM provider creates a data governance problem.

Workers AI solves this because:

  • Data stays on Cloudflare’s network. Your incident telemetry goes from your Worker to the GPU — it never leaves the network boundary. No third-party data processing agreements needed.
  • Sub-second inference. The model runs on GPUs co-located with Cloudflare’s edge. Latency is measured in hundreds of milliseconds, not seconds.
  • No cold starts for the binding. The env.AI binding is always ready. The model infrastructure is managed by Cloudflare — you don’t provision GPUs or manage model deployments.
  • Per-request pricing. You pay for what you use. No reserved capacity, no minimum spend.

The AI binding pattern

The core API is one function call:

const result = await env.AI.run(
  "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
  {
    messages: [
      { role: "system", content: "You are a security analyst..." },
      { role: "user", content: "Analyze this incident: ..." },
    ],
  }
);

That’s it. The binding handles:

  • Authentication — no API keys to manage or rotate
  • Routing — the request goes to the nearest available GPU
  • Model lifecycle — Cloudflare manages model loading, scaling, and updates

The response contains the model’s generated text, which your Worker can parse, store, or stream to the client.

Step 1: Uncomment the AI binding

Open sirt-workshop-app/wrangler.jsonc and find the commented-out AI binding block:

// Uncomment for lesson 08:
// "ai": {
//   "binding": "AI"
// }

Uncomment the block so it reads:

"ai": {
  "binding": "AI"
}

This makes env.AI available in your Worker code. The binding name AI is a convention — the runtime knows how to route calls made through this binding to Cloudflare’s inference infrastructure.

Step 2: Change STATE to state-4-workers-ai

In the same wrangler.jsonc, update the STATE variable:

"vars": {
  "STATE": "state-4-workers-ai"
}

This activates the investigate handler in src/server.ts. When STATE is state-4-workers-ai, the app’s “Investigate” button makes a POST request to /api/investigate/:incidentId, which loads the incident from D1, constructs a prompt with the incident’s telemetry data, and calls env.AI.run() to generate an analysis.

Step 3: Redeploy

Deploy the updated configuration:

npx wrangler deploy

You should see the deployment succeed with no errors. Wrangler recognizes the new AI binding and configures it automatically.

Step 4: Verify Workers AI is responding

  1. Open your app in the browser at your *.workers.dev URL.
  2. Click on any incident in the queue — for example, the Malware + C2 incident.
  3. Click the “Investigate” button in the incident detail view.
  4. Within a few seconds, you should see a model-generated analysis appear in the triage panel.

The analysis is coming from Workers AI running on Cloudflare’s GPUs. The model receives the incident’s telemetry data as context and returns a structured assessment — what happened, how severe it is, and what to investigate next.

What comes next

Right now, you’re making a single model call that tries to analyze every dimension of the incident at once. In lesson 09, you’ll look at exactly how that call is structured — the prompt template, the incident data formatting, and how the response renders in the UI. Then in lessons 10–12, you’ll decompose this single call into multiple specialized sub-agents that analyze different dimensions in parallel.

Knowledge check