NewTOON FormatAI AgentsContext Engineering

Half Your Tokens Are Wasted on JSON. TOON Fixes It.

Cut the serialization tax from your AI agent pipelines.

Feb 25, 202611 min read

01

The Problem

Take any JSON response from a REST API and count the curly braces, the brackets, the quoted key names, the colons, and the commas. That structural scaffolding accounts for roughly 40-50% of the tokens an LLM processes — and TOON eliminates it. Your model cannot skip those tokens in JSON. Every one occupies space in the context window, consumes attention, and costs money while carrying zero semantic information.

This was never a problem when the consumers of your APIs were browsers and microservices. A JsonSerializer.Deserialize() call costs microseconds. But the fastest-growing class of API consumer today is an LLM-backed agent, and LLMs pay for structure in a currency that matters: context window space.

02

Enter TOON

TOON — Token-Oriented Object Notation — encodes the same JSON data model but strips the structural noise. Hierarchy is expressed through indentation. Arrays of uniform objects collapse into compact tables with a field header ([count]{fields}:) followed by CSV-style rows. No braces, no brackets, no quoted keys, no commas. The result tokenizes roughly 50% more efficiently while remaining trivially readable.

The key insight

You do not need to rip out JSON. A single middleware layer lets AI agents opt into TOON via content negotiation while every other client continues receiving JSON. Additive, not disruptive.

03

Counting What Matters

Let us start with a concrete example. Here is a product record — the kind of thing a typical e-commerce API returns a thousand times a day:

product.jsonJSON
{
  "product": {
    "id": "prd_8kx92m",
    "name": "Merino Wool Crew",
    "brand": "Outlier",
    "category": "apparel",
    "price": 89.00,
    "currency": "USD",
    "in_stock": true,
    "rating": 4.7,
    "variants": [
      {
        "sku": "MWC-BLK-S",
        "color": "black",
        "size": "S",
        "stock_count": 24
      },
      {
        "sku": "MWC-BLK-M",
        "color": "black",
        "size": "M",
        "stock_count": 18
      },
      {
        "sku": "MWC-NAV-L",
        "color": "navy",
        "size": "L",
        "stock_count": 7
      }
    ]
  }
}
product.toonTOON
product:
  id: prd_8kx92m
  name: Merino Wool Crew
  brand: Outlier
  category: apparel
  price: 89
  currency: USD
  in_stock: true
  rating: 4.7
  variants[3]{sku,color,size,stock_count}:
    MWC-BLK-S,black,S,24
    MWC-BLK-M,black,M,18
    MWC-NAV-L,navy,L,7

The difference is visible at a glance: 589 bytes versus 263. But byte count is not the metric that matters for LLMs — tokens are. Running both through a modern tokenizer:

~147

JSON tokens

~65

TOON tokens

56%

fewer tokens

A 56% reduction on a single record. Real payloads are larger — a paginated listing returns 25-50 items, an inventory sync might return hundreds — and the savings compound across every request in an agent's reasoning chain. Where do those savings come from? Two mechanisms:

Structural overhead in JSON vs TOONtext
1. Punctuation overhead
   JSON requires braces, brackets, quotes on every key,
   colons, and commas. For an object with 4 fields:

   JSON:  { "sku": "...", "color": "...", "size": "...", "stock": 24 }
   TOON:  sku: ..., encoded as a CSV row under a schema header

   Each punctuation character is either a token itself or
   forces a token boundary, inflating the count.

2. Schema-aware array compression
   JSON repeats every key for every object in an array.
   TOON declares the schema once and streams rows:

   JSON (3 variants):
     {"sku":"MWC-BLK-S","color":"black","size":"S","stock":24},
     {"sku":"MWC-BLK-M","color":"black","size":"M","stock":18},
     {"sku":"MWC-NAV-L","color":"navy","size":"L","stock":7}

   TOON (3 variants):
     variants[3]{sku,color,size,stock}:
       MWC-BLK-S,black,S,24
       MWC-BLK-M,black,M,18
       MWC-NAV-L,navy,L,7

   The 12 repeated key tokens in JSON become 1 schema line.
   The larger the array, the bigger the savings.

The savings are most extreme in flat structures with repeated keys — which is exactly what REST APIs return. Product listings, user tables, transaction logs, search results. These are the payloads agents consume at volume, and they are precisely where the overhead hurts most.

On token counts

Token counts vary by tokenizer. The estimates in this article are representative of modern tokenizers (cl100k_base, o200k_base) used by current frontier models. The exact numbers will differ by a few percent between model families, but the relative savings — roughly half the tokens — are consistent across tokenizers because TOON eliminates structural characters that all tokenizers must encode.

04

Context Rot

Fewer tokens does not just mean lower cost. It means better reasoning.

An LLM has a fixed context window — a hard ceiling on how much information it can hold at once. Everything in that window competes for attention: the system prompt, conversation history, retrieved documents, tool outputs, and the API data itself. Structural tokens from JSON crowd out the information that actually matters.

This is context rot: the gradual degradation of model performance as the window fills with low-signal content. It does not fail catastrophically. The agent does not throw an error. It just gets slightly worse — slightly less accurate, slightly more prone to hallucination. The failure mode is invisible, which makes it dangerous.

Context engineering is the discipline of managing this window deliberately. For every token we put in, what is the marginal value? System prompts: high value. Retrieved documents: high value. Thirty-seven closing braces from a JSON payload: zero value.

TOON is a context engineering tool. By eliminating structural overhead, it shifts the ratio in the context window toward signal. Consider an agent that needs to reason over 200 products to find the best match for a customer query:

Context window budgettext
Available context:        128,000 tokens

System prompt:              2,000 tokens
Conversation history:       8,000 tokens
RAG documents:             12,000 tokens
─────────────────────────────────────────
Remaining for API data:   106,000 tokens

With JSON (~75 tokens/product):
  → 1,413 products fit  → 200 products = 15,000 tokens

With TOON (~40 tokens/product):
  → 2,650 products fit  → 200 products = 8,000 tokens
  → 7,000 tokens freed for additional context

Those 7,000 reclaimed tokens can hold:
  → ~175 more products, OR
  → ~2 additional RAG documents, OR
  → Richer system prompt + few-shot examples

Those reclaimed tokens are working memory. More room for chain-of-thought reasoning, few-shot examples, and guardrails that prevent hallucination. Every token you save on serialization overhead is a token you can spend on intelligence.

Why this matters now

As agent systems move from single-shot API calls to multi-step reasoning chains, the context window becomes a shared resource across many tool calls. A ~50% reduction per call means the difference between an agent that can complete a 12-step research task and one that runs out of context at step 8.

05

The Middleware

The implementation is embarrassingly simple, which is the point. One middleware, deployed once, gives every endpoint TOON support through standard HTTP content negotiation. No per-endpoint changes, no API rewrites.

The pattern: intercept the response, check the Accept header, serialize to TOON if requested, pass through unchanged otherwise. Here is the ASP.NET Core implementation:

TOON Middleware
C#
using Microsoft.AspNetCore.Http;
using System.Text.Json;
using ToonFormat; // See toonformat.dev/ecosystem/implementations

public class ToonMiddleware
{
    private readonly RequestDelegate _next;

    public ToonMiddleware(RequestDelegate next)
    {
        _next = next;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        var acceptHeader = context.Request.Headers.Accept.ToString();

        if (!acceptHeader.Contains("text/toon"))
        {
            await _next(context);
            return;
        }

        // Capture the original response body
        var originalBody = context.Response.Body;
        using var buffer = new MemoryStream();
        context.Response.Body = buffer;

        await _next(context);

        // Read the JSON response
        buffer.Seek(0, SeekOrigin.Begin);
        var json = await JsonDocument.ParseAsync(buffer);

        // Serialize to TOON
        var toon = ToonEncoder.Encode(json.RootElement);

        // Write the TOON response
        context.Response.Body = originalBody;
        context.Response.ContentType = "text/toon; charset=utf-8";
        context.Response.Headers["X-Original-Content-Type"]
            = "application/json";
        await context.Response.WriteAsync(toon);
    }
}

Registration is one line in your pipeline:

Program.csC#
var app = builder.Build();

app.UseMiddleware<ToonMiddleware>();

app.MapGet("/api/products", async (AppDbContext db) =>
{
    var products = await db.Products
        .Include(p => p.Variants)
        .Take(50)
        .ToListAsync();

    return Results.Ok(new { products });
});

// That's it. If the client sends "Accept: text/toon",
// they get TOON. Otherwise, they get JSON.
// Every endpoint behind this middleware works the same way.

The critical design decision is that TOON is opt-in per request, not per endpoint. Your OpenAPI spec stays the same. Your tests stay the same. Your human consumers never see the difference. Only the agents that know to ask for text/toon get the optimized format.

Other frameworks

TOON has official and community implementations for TypeScript, Python, Go, Rust, and more. The middleware pattern is identical in every framework. See toonformat.dev/ecosystem/implementations for the full list.

Production consideration

In production, you may want to add TOON support at the API gateway level (YARP, Envoy, Kong, or a custom edge function) rather than per-application. This gives you organization-wide coverage, centralized caching, and the ability to toggle it with a feature flag — no application code changes needed.

06

Full Response Comparison

Beyond a single product, here is what a real paginated API response looks like — the kind of payload an agent receives from a product search endpoint:

GET /api/products?q=woolJSON
{
  "meta": {
    "total": 142,
    "page": 1,
    "per_page": 3,
    "query": "wool"
  },
  "products": [
    {
      "id": "prd_8kx92m",
      "name": "Merino Wool Crew",
      "price": 89.00,
      "in_stock": true,
      "variants": [
        {
          "sku": "MWC-BLK-S",
          "color": "black",
          "size": "S",
          "stock": 24
        },
        {
          "sku": "MWC-NAV-M",
          "color": "navy",
          "size": "M",
          "stock": 12
        }
      ]
    },
    {
      "id": "prd_3jn71q",
      "name": "Wool Zip Hoodie",
      "price": 148.00,
      "in_stock": true,
      "variants": [
        {
          "sku": "WZH-GRY-L",
          "color": "grey",
          "size": "L",
          "stock": 6
        }
      ]
    },
    {
      "id": "prd_9vm45r",
      "name": "Lambswool Scarf",
      "price": 55.00,
      "in_stock": false,
      "variants": []
    }
  ]
}
Accept: text/toonTOON
meta:
  total: 142
  page: 1
  per_page: 3
  query: wool
products[3]:
  - id: prd_8kx92m
    name: Merino Wool Crew
    price: 89
    in_stock: true
    variants[2]{sku,color,size,stock}:
      MWC-BLK-S,black,S,24
      MWC-NAV-M,navy,M,12
  - id: prd_3jn71q
    name: Wool Zip Hoodie
    price: 148
    in_stock: true
    variants[1]{sku,color,size,stock}:
      WZH-GRY-L,grey,L,6
  - id: prd_9vm45r
    name: Lambswool Scarf
    price: 55
    in_stock: false
    variants[0]:

913 bytes versus 480 — but the token breakdown is what matters:

Token comparison — paginated response

JSON

~228

tokens

TOON

~120

tokens — 47% fewer

The schema-aware compression is doing the heavy lifting. In JSON, the four variant field names repeat for every variant object across all three products — the keys appear 24 times. In TOON, they appear once per product as a schema header: variants[2]{sku,color,size,stock}:. After that, each row is pure values. The gap widens as payloads grow:

Scaling analysistext
Products     JSON tokens    TOON tokens    Reduction
─────────    ───────────    ───────────    ─────────
1            ~147           ~65            ~56%
3            ~228           ~120           ~47%
10           ~750           ~390           ~48%
25           ~1,870         ~960           ~49%
50           ~3,740         ~1,920         ~49%
100          ~7,480         ~3,840         ~49%

Schema-aware arrays become more efficient as they grow —
field names declared once, not repeated for every object.
At 100 products, TOON cuts token usage roughly in half.

A note on cost

The direct dollar savings depend on which model you use and how you are billed — pricing changes frequently and varies by provider. What does not change is the ratio: TOON consistently uses ~50% fewer tokens than JSON for typical API payloads. That means ~50% lower input token cost for any model, at any price point. Whether your bill is $100/month or $100,000/month, halving the tokens spent on serialization overhead is worth capturing.

Where This Goes

TOON is not replacing JSON. Browsers, mobile apps, legacy integrations — they keep speaking JSON. The transformation happens at the edge, transparently, for the consumers that benefit from it.

The pattern is the same one we followed with gzip. Nobody rewrote their APIs to produce compressed output. A middleware layer checked for Accept-Encoding: gzip and handled it transparently. TOON applies the same idea to a different bottleneck: where gzip optimized for bandwidth, TOON optimizes for context windows.

The middleware is trivial. The content negotiation is standard HTTP. The hard part is recognizing that the consumers of your API have changed — and that optimizing for them is a competitive advantage, not an academic exercise.

Deploy the middleware. Ship the header. Let your agents breathe.

Need help implementing TOON?

Let's engineer the right context for your AI systems.

I work with engineering teams to implement formats like TOON and architect context-aware agent systems.