ContextPrune

Garbage collection for LLM context windows.

Before every LLM call, ContextPrune strips dead context from your messages array. Deterministic. Zero extra LLM calls. Under 10ms. The dashboard shows what's eating your context budget — and ten pattern detectors tell you exactly what's misconfigured before sessions overflow.

deterministiczero extra LLMproactive recommendations
See the docs

Why this exists

Every message your agent has ever exchanged lives in one array — re-sent to the LLM on every single call. After 30 turns, most of it is fixed errors, stale file reads, and dead reasoning. The model pays to read it. You pay to send it.

You're paying for dead weight

A 30-turn session re-sends 2.4M input tokens, half of it useless. At GPT-4o pricing across 5,000 sessions, that's $6,500/month wasted on context that does nothing.

The model loses focus

Buried instructions get ignored. Your agent repeats itself, contradicts earlier turns, forgets the task. This is the "Lost in the Middle" effect — and bloated context makes it worse.

Then you hit the cliff

Around 65–75% utilisation, behaviour breaks sharply. Most teams panic-clear the whole context and start over, losing the good with the bad.

Compress. Analyze. Recommend.

What it does

Compress

Lean context, every call

Strip dead weight from your messages array deterministically — under 10ms, no LLM in the loop. 40–60% fewer input tokens with zero loss of critical context. The same input always produces the same output.

Analyze

See what your context actually costs

Track utilisation, fixed overhead, compression savings, and overflow risk across every session. Spot the bloat before it hits your bill or your output quality.

Recommend

Tuning advice that pays for itself

Ten pattern detectors evaluate your real traffic and surface exactly what's misconfigured — threshold settings, system prompt size, tool schema bloat — prioritised by impact. Not generic advice. Findings based on your sessions.

Maximum signal. Zero noise.

What you're guaranteed

What gets removed

  • Stale tool results — file reads no longer referenced
  • Resolved errors — stack traces from fixed bugs
  • Intermediate reasoning — collapsed to one line
  • Status updates and redundant confirmations
  • Duplicate tool calls — deduplicated to most recent

What is guaranteed to stay

  • System prompts and instructions — pinned, never touched
  • User corrections and overrides — preserved across all turns
  • The session's original goal
  • Any error that is still unresolved

A post-compression validator runs before anything is returned. Critical context is never lost.

Built for production

<10ms

p99 processing latency

0ms

local skip when context is small

40–60%

typical input token reduction

100%

deterministic — no LLM in the pipeline

PII redaction built in

Enterprise ready

18+ patterns — credentials, SSNs, API keys, credit cards, phone numbers, IPs — stripped locally before any data leaves your machine. Restored in your response.

Built for AI engineers running LLMs in production

Who it's for

AI engineers building agentic workflows

Multi-step agents accumulate context fast. ContextPrune keeps each step lean without manual intervention — even across 200-turn sessions.

Teams managing API costs at scale

Token costs compound quietly. ContextPrune pays for itself in the first month for any team running significant LLM volume.

Platform engineers building LLM infrastructure

Drop ContextPrune into your middleware layer and handle context management once, for every service behind it.

One SDK. Four languages.

Install once. Works the same way across your entire stack.

Drop-in replacement for your messages array. No configuration required to get started.

Python

pip install contextprune

from contextprune import prune
messages = prune(messages)

TypeScript

npm install contextprune

import { prune } from 'contextprune'
const messages = await prune(messages)

Go

go get github.com/grapine-ai/contextprune

messages, _ := contextprune.Prune(messages)

Rust

contextprune = "0.1"

let messages = contextprune::prune(messages)?;

Built in the open

The ContextPrune SDK is open source. The core compression algorithm and all four language SDKs are on GitHub — free to use, inspect, and contribute to.

View on GitHub

Your context window is mostly garbage.

And you're paying to send it on every call. ContextPrune fixes that — deterministically, in under 10ms, with no extra LLM in the loop.

Go to ContextPrune