Back to all ideas
Framework

Chunking Strategies for RAG

How you split documents determines RAG quality. Learn the five chunking strategies and when to use each one.

RAGArchitectureBest Practices

The Core Idea

Chunking is the most impactful decision in RAG quality. Get it wrong, and no amount of better embeddings or LLMs will save you.

Most RAG tutorials treat chunking as an afterthought: "just split by 500 tokens." This works for demos but fails in production because different documents need different chunking strategies.

Why Chunking Matters

The Retrieval Problem

When a user asks a question, your retriever must find the right chunk. This requires:

  1. The chunk contains the answer — Can't find what isn't there
  2. The chunk is semantically coherent — Embedding captures the meaning
  3. The chunk is appropriately sized — Not too big, not too small

Bad chunking breaks all three.

The Size Dilemma

Chunks too small:

  • Lose context needed to understand meaning
  • "The answer is yes" — yes to what?
  • Retrieve many fragments, can't synthesize

Chunks too large:

  • Embedding averages too much meaning together
  • "This chunk is about everything and nothing"
  • Context window fills with irrelevant text

The sweet spot depends on the document type.

The Five Chunking Strategies

Strategy 1: Fixed-Size Chunking

How it works: Split every N tokens/characters, with optional overlap.

def fixed_size_chunk(text: str, chunk_size: int = 500, overlap: int = 50):
    tokens = tokenize(text)
    chunks = []
    
    for i in range(0, len(tokens), chunk_size - overlap):
        chunk = tokens[i:i + chunk_size]
        chunks.append(detokenize(chunk))
    
    return chunks

Pros:

  • Simple, predictable
  • Works for any document
  • Easy to tune

Cons:

  • Ignores document structure
  • Splits mid-sentence, mid-paragraph
  • Semantically incoherent boundaries

Best for: Uniform text without clear structure (logs, transcripts).

Typical settings: 256-512 tokens, 10-20% overlap.


Strategy 2: Semantic Chunking

How it works: Split when the topic/meaning changes.

def semantic_chunk(text: str, threshold: float = 0.5):
    sentences = split_sentences(text)
    embeddings = [embed(s) for s in sentences]
    
    chunks = []
    current_chunk = [sentences[0]]
    
    for i in range(1, len(sentences)):
        similarity = cosine_similarity(embeddings[i], embeddings[i-1])
        
        if similarity < threshold:
            # Topic shift detected, start new chunk
            chunks.append(" ".join(current_chunk))
            current_chunk = [sentences[i]]
        else:
            current_chunk.append(sentences[i])
    
    chunks.append(" ".join(current_chunk))
    return chunks

Pros:

  • Respects meaning boundaries
  • Better embedding quality
  • Fewer retrieval misses

Cons:

  • Computationally expensive
  • Variable chunk sizes
  • Threshold tuning required

Best for: Articles, documentation, narrative text.

Typical settings: threshold 0.3-0.7 depending on topic density.


Strategy 3: Document Structure Chunking

How it works: Use document structure (headings, sections) as boundaries.

def structure_chunk(document: Document):
    chunks = []
    
    for section in document.sections:
        # Each major section is a chunk
        if len(section.content) < MAX_CHUNK_SIZE:
            chunks.append(Chunk(
                text=section.content,
                metadata={
                    "heading": section.heading,
                    "level": section.level,
                    "parent": section.parent_heading
                }
            ))
        else:
            # Section too large, split by subsections or paragraphs
            for subsection in section.split_by_structure():
                chunks.append(Chunk(
                    text=subsection.content,
                    metadata={...}
                ))
    
    return chunks

Pros:

  • Preserves author's intended organization
  • Rich metadata for filtering
  • Natural semantic coherence

Cons:

  • Requires structured documents
  • Section sizes vary wildly
  • Parsing complexity

Best for: Technical docs, manuals, wikis, legal documents.

Implementation tip: Parse markdown/HTML structure explicitly.


Strategy 4: Recursive Chunking

How it works: Start with large chunks, recursively split until target size.

def recursive_chunk(text: str, target_size: int = 500, separators: list = None):
    if separators is None:
        separators = ["\n\n", "\n", ". ", " ", ""]
    
    if len(tokenize(text)) <= target_size:
        return [text]
    
    for separator in separators:
        if separator in text:
            splits = text.split(separator)
            chunks = []
            for split in splits:
                chunks.extend(recursive_chunk(split, target_size, separators))
            return chunks
    
    # No separators left, force split
    return fixed_size_chunk(text, target_size)

Pros:

  • Adapts to document structure
  • Prefers natural boundaries
  • Handles mixed content well

Cons:

  • Still may split mid-thought
  • Separator priority is heuristic
  • Can create tiny chunks

Best for: General-purpose default, mixed document types.

Typical separators: ["\n\n", "\n", ". ", ", ", " "]


Strategy 5: Agentic Chunking

How it works: Use an LLM to decide chunk boundaries.

def agentic_chunk(text: str, context: str = None):
    prompt = f"""
    Divide this text into coherent chunks. Each chunk should:
    1. Contain one complete idea or topic
    2. Be understandable without other chunks
    3. Be 100-500 words
    
    Return JSON array of chunks.
    
    Text:
    {text}
    """
    
    response = llm.complete(prompt)
    return parse_json_chunks(response)

Pros:

  • Human-quality boundaries
  • Handles complex structures
  • Can add summaries/metadata

Cons:

  • Expensive (LLM call per document)
  • Slow
  • Non-deterministic

Best for: High-value documents worth the cost, pre-processing pipelines.

Cost example: $0.01-0.10 per page with GPT-4.

Choosing the Right Strategy

Decision Matrix

Document TypeRecommended StrategyWhy
Technical docsStructureNatural sections, headings critical
Legal documentsStructure + small chunksPrecision required
Articles/blogsSemantic or RecursiveTopic flow matters
Chat logsFixed-sizeNo structure to exploit
CodeStructure (by function)Syntax boundaries critical
BooksChapter → SemanticMulti-level structure
PDFs (mixed)RecursiveHandle tables, images, text

The Hybrid Approach

In production, combine strategies:

def hybrid_chunk(document: Document):
    # 1. First, split by structure if available
    if document.has_structure():
        sections = structure_chunk(document)
    else:
        sections = [document.text]
    
    # 2. For each section, apply semantic or recursive chunking
    chunks = []
    for section in sections:
        if len(tokenize(section)) > MAX_CHUNK_SIZE:
            sub_chunks = recursive_chunk(section, target_size=400)
            chunks.extend(sub_chunks)
        else:
            chunks.append(section)
    
    # 3. Add overlap for continuity
    chunks = add_overlap(chunks, overlap_tokens=50)
    
    return chunks

Chunking Best Practices

1. Preserve Context with Metadata

Include surrounding context in chunk metadata:

chunk = Chunk(
    text="The return policy allows 30-day returns.",
    metadata={
        "source": "handbook.pdf",
        "section": "Chapter 4: Returns",
        "page": 23,
        "preceding_heading": "Customer Policies",
        "document_summary": "Employee handbook covering HR policies..."
    }
)

This helps LLMs understand context even when chunks are retrieved in isolation.

2. Add Contextual Headers

Prepend section context to each chunk:

def add_context_header(chunk: str, section: str, document: str) -> str:
    return f"[{document} > {section}]\n{chunk}"

# Before: "Returns are accepted within 30 days."
# After: "[Employee Handbook > Return Policy]\nReturns are accepted within 30 days."

This helps embeddings capture the full meaning.

3. Handle Tables and Lists

Tables and lists need special treatment:

def chunk_table(table: Table) -> list[Chunk]:
    chunks = []
    
    # Option A: Serialize as markdown
    chunks.append(Chunk(
        text=table.to_markdown(),
        type="table"
    ))
    
    # Option B: Natural language summary
    summary = llm.complete(f"Summarize this table: {table.to_markdown()}")
    chunks.append(Chunk(
        text=summary,
        type="table_summary",
        original_table=table.to_markdown()
    ))
    
    return chunks

4. Test with Eval Set

Build an eval set to measure retrieval quality:

eval_set = [
    {
        "query": "What is the return policy?",
        "relevant_chunks": ["handbook_chunk_23", "handbook_chunk_24"],
        "irrelevant_chunks": ["handbook_chunk_1", "handbook_chunk_50"]
    },
    ...
]

def evaluate_chunking(chunks, eval_set):
    retriever = build_retriever(chunks)
    
    metrics = {"precision": [], "recall": []}
    for test in eval_set:
        results = retriever.search(test["query"], k=5)
        
        retrieved_ids = {r.id for r in results}
        relevant_ids = set(test["relevant_chunks"])
        
        precision = len(retrieved_ids & relevant_ids) / len(retrieved_ids)
        recall = len(retrieved_ids & relevant_ids) / len(relevant_ids)
        
        metrics["precision"].append(precision)
        metrics["recall"].append(recall)
    
    return {k: sum(v)/len(v) for k, v in metrics.items()}

5. Iterate and Measure

Chunking is not a one-time decision:

  1. Start with recursive chunking (good default)
  2. Build eval set from real user queries
  3. Measure retrieval quality
  4. Identify failure patterns
  5. Try alternative strategies
  6. A/B test in production

Common Mistakes

Mistake 1: One-Size-Fits-All

Using 500-token fixed chunks for everything. Different content needs different strategies.

Mistake 2: Ignoring Overlap

Zero overlap means context can be split across chunks. Always use 10-20% overlap.

Mistake 3: Chunking Before Cleaning

Chunk clean text, not raw HTML/markdown with artifacts.

Mistake 4: No Metadata

Chunks without source/section metadata make debugging impossible.

Mistake 5: Not Testing

Evaluating chunking "by feel" instead of with metrics.

Conclusion

Chunking is where RAG systems are won or lost.

The rules:

  1. Match strategy to document type
  2. Preserve structure and context
  3. Test with real queries
  4. Iterate based on metrics

Get chunking right, and retrieval quality follows. Get it wrong, and nothing else matters.


The best RAG engineers spend 50% of their time on chunking. The worst spend 0% and wonder why retrieval fails.

What's your chunking strategy?

AM

Abhinav Mahajan

AI Product & Engineering Leader

Building AI systems that work in production. These frameworks come from real experience shipping enterprise AI products.

💡 Apply This Framework

Find This Framework Useful?

I'd love to hear how you've applied it or discuss related ideas. Let's explore how these principles apply to your specific context.