How to Manage Claude Tokens
Advanced Techniques

Chunking long documents

Don't pass the whole PDF

Passing a 50-page PDF in one request is expensive and often wasteful — Claude gets a lot of context it doesn't use. Chunking breaks the document into pieces and only passes the relevant section.

Basic chunking strategy

  • Split documents into 500–1,000 token chunks
  • Use embeddings to find the most relevant chunks for a given query
  • Only pass the top 2–3 chunks in the actual Claude request

This is the foundation of RAG (retrieval-augmented generation) and can reduce input tokens by 80–95% for document Q&A workloads.

operator note

Use Supabase pgvector for embeddings + chunked retrieval before passing to Claude. This is the production-grade pattern.

Changelog · 1
  • Initial release — 5 sections, 11 lessons.