How to Manage Claude Tokens
◇ Advanced Techniques
Chunking long documents
Don't pass the whole PDF
Passing a 50-page PDF in one request is expensive and often wasteful — Claude gets a lot of context it doesn't use. Chunking breaks the document into pieces and only passes the relevant section.
Basic chunking strategy
- Split documents into 500–1,000 token chunks
- Use embeddings to find the most relevant chunks for a given query
- Only pass the top 2–3 chunks in the actual Claude request
This is the foundation of RAG (retrieval-augmented generation) and can reduce input tokens by 80–95% for document Q&A workloads.
operator note
Use Supabase pgvector for embeddings + chunked retrieval before passing to Claude. This is the production-grade pattern.
Changelog · 1
- Initial release — 5 sections, 11 lessons.