Framework

Chunking Strategies for RAG

How you split documents determines RAG quality. Learn the five chunking strategies and when to use each one.

RAGArchitectureBest Practices

The Core Idea

Chunking is the most impactful decision in RAG quality. Get it wrong, and no amount of better embeddings or LLMs will save you.

Most RAG tutorials treat chunking as an afterthought: "just split by 500 tokens." This works for demos but fails in production because different documents need different chunking strategies.

Why Chunking Matters

The Retrieval Problem

When a user asks a question, your retriever must find the right chunk. This requires:

The chunk contains the answer — Can't find what isn't there
The chunk is semantically coherent — Embedding captures the meaning
The chunk is appropriately sized — Not too big, not too small

Bad chunking breaks all three.

The Size Dilemma

Chunks too small:

Lose context needed to understand meaning
"The answer is yes" — yes to what?
Retrieve many fragments, can't synthesize

Chunks too large:

Embedding averages too much meaning together
"This chunk is about everything and nothing"
Context window fills with irrelevant text

The sweet spot depends on the document type.

The Five Chunking Strategies

Strategy 1: Fixed-Size Chunking

How it works: Split every N tokens/characters, with optional overlap.

def fixed_size_chunk(text: str, chunk_size: int = 500, overlap: int = 50):
    tokens = tokenize(text)
    chunks = []
    
    for i in range(0, len(tokens), chunk_size - overlap):
        chunk = tokens[i:i + chunk_size]
        chunks.append(detokenize(chunk))
    
    return chunks

Pros:

Simple, predictable
Works for any document
Easy to tune

Cons:

Ignores document structure
Splits mid-sentence, mid-paragraph
Semantically incoherent boundaries

Best for: Uniform text without clear structure (logs, transcripts).

Typical settings: 256-512 tokens, 10-20% overlap.

Strategy 2: Semantic Chunking

How it works: Split when the topic/meaning changes.

def semantic_chunk(text: str, threshold: float = 0.5):
    sentences = split_sentences(text)
    embeddings = [embed(s) for s in sentences]
    
    chunks = []
    current_chunk = [sentences[0]]
    
    for i in range(1, len(sentences)):
        similarity = cosine_similarity(embeddings[i], embeddings[i-1])
        
        if similarity < threshold:
            # Topic shift detected, start new chunk
            chunks.append(" ".join(current_chunk))
            current_chunk = [sentences[i]]
        else:
            current_chunk.append(sentences[i])
    
    chunks.append(" ".join(current_chunk))
    return chunks

Pros:

Respects meaning boundaries
Better embedding quality
Fewer retrieval misses

Cons:

Computationally expensive
Variable chunk sizes
Threshold tuning required

Best for: Articles, documentation, narrative text.

Typical settings: threshold 0.3-0.7 depending on topic density.

Strategy 3: Document Structure Chunking

How it works: Use document structure (headings, sections) as boundaries.

def structure_chunk(document: Document):
    chunks = []
    
    for section in document.sections:
        # Each major section is a chunk
        if len(section.content) < MAX_CHUNK_SIZE:
            chunks.append(Chunk(
                text=section.content,
                metadata={
                    "heading": section.heading,
                    "level": section.level,
                    "parent": section.parent_heading
                }
            ))
        else:
            # Section too large, split by subsections or paragraphs
            for subsection in section.split_by_structure():
                chunks.append(Chunk(
                    text=subsection.content,
                    metadata={...}
                ))
    
    return chunks

Pros:

Preserves author's intended organization
Rich metadata for filtering
Natural semantic coherence

Cons:

Requires structured documents
Section sizes vary wildly
Parsing complexity

Best for: Technical docs, manuals, wikis, legal documents.

Implementation tip: Parse markdown/HTML structure explicitly.

Strategy 4: Recursive Chunking

How it works: Start with large chunks, recursively split until target size.

def recursive_chunk(text: str, target_size: int = 500, separators: list = None):
    if separators is None:
        separators = ["\n\n", "\n", ". ", " ", ""]
    
    if len(tokenize(text)) <= target_size:
        return [text]
    
    for separator in separators:
        if separator in text:
            splits = text.split(separator)
            chunks = []
            for split in splits:
                chunks.extend(recursive_chunk(split, target_size, separators))
            return chunks
    
    # No separators left, force split
    return fixed_size_chunk(text, target_size)

Pros:

Adapts to document structure
Prefers natural boundaries
Handles mixed content well

Cons:

Still may split mid-thought
Separator priority is heuristic
Can create tiny chunks

Best for: General-purpose default, mixed document types.

Typical separators: ["\n\n", "\n", ". ", ", ", " "]

Strategy 5: Agentic Chunking

How it works: Use an LLM to decide chunk boundaries.

def agentic_chunk(text: str, context: str = None):
    prompt = f"""
    Divide this text into coherent chunks. Each chunk should:
    1. Contain one complete idea or topic
    2. Be understandable without other chunks
    3. Be 100-500 words
    
    Return JSON array of chunks.
    
    Text:
    {text}
    """
    
    response = llm.complete(prompt)
    return parse_json_chunks(response)

Pros:

Human-quality boundaries
Handles complex structures
Can add summaries/metadata

Cons:

Expensive (LLM call per document)
Slow
Non-deterministic

Best for: High-value documents worth the cost, pre-processing pipelines.

Cost example: $0.01-0.10 per page with GPT-4.

Choosing the Right Strategy

Decision Matrix

Document Type	Recommended Strategy	Why
Technical docs	Structure	Natural sections, headings critical
Legal documents	Structure + small chunks	Precision required
Articles/blogs	Semantic or Recursive	Topic flow matters
Chat logs	Fixed-size	No structure to exploit
Code	Structure (by function)	Syntax boundaries critical
Books	Chapter → Semantic	Multi-level structure
PDFs (mixed)	Recursive	Handle tables, images, text

The Hybrid Approach

In production, combine strategies:

def hybrid_chunk(document: Document):
    # 1. First, split by structure if available
    if document.has_structure():
        sections = structure_chunk(document)
    else:
        sections = [document.text]
    
    # 2. For each section, apply semantic or recursive chunking
    chunks = []
    for section in sections:
        if len(tokenize(section)) > MAX_CHUNK_SIZE:
            sub_chunks = recursive_chunk(section, target_size=400)
            chunks.extend(sub_chunks)
        else:
            chunks.append(section)
    
    # 3. Add overlap for continuity
    chunks = add_overlap(chunks, overlap_tokens=50)
    
    return chunks

Chunking Best Practices

1. Preserve Context with Metadata

Include surrounding context in chunk metadata:

chunk = Chunk(
    text="The return policy allows 30-day returns.",
    metadata={
        "source": "handbook.pdf",
        "section": "Chapter 4: Returns",
        "page": 23,
        "preceding_heading": "Customer Policies",
        "document_summary": "Employee handbook covering HR policies..."
    }
)

This helps LLMs understand context even when chunks are retrieved in isolation.

2. Add Contextual Headers

Prepend section context to each chunk:

def add_context_header(chunk: str, section: str, document: str) -> str:
    return f"[{document} > {section}]\n{chunk}"

# Before: "Returns are accepted within 30 days."
# After: "[Employee Handbook > Return Policy]\nReturns are accepted within 30 days."

This helps embeddings capture the full meaning.

3. Handle Tables and Lists

Tables and lists need special treatment:

def chunk_table(table: Table) -> list[Chunk]:
    chunks = []
    
    # Option A: Serialize as markdown
    chunks.append(Chunk(
        text=table.to_markdown(),
        type="table"
    ))
    
    # Option B: Natural language summary
    summary = llm.complete(f"Summarize this table: {table.to_markdown()}")
    chunks.append(Chunk(
        text=summary,
        type="table_summary",
        original_table=table.to_markdown()
    ))
    
    return chunks

4. Test with Eval Set

Build an eval set to measure retrieval quality:

eval_set = [
    {
        "query": "What is the return policy?",
        "relevant_chunks": ["handbook_chunk_23", "handbook_chunk_24"],
        "irrelevant_chunks": ["handbook_chunk_1", "handbook_chunk_50"]
    },
    ...
]

def evaluate_chunking(chunks, eval_set):
    retriever = build_retriever(chunks)
    
    metrics = {"precision": [], "recall": []}
    for test in eval_set:
        results = retriever.search(test["query"], k=5)
        
        retrieved_ids = {r.id for r in results}
        relevant_ids = set(test["relevant_chunks"])
        
        precision = len(retrieved_ids & relevant_ids) / len(retrieved_ids)
        recall = len(retrieved_ids & relevant_ids) / len(relevant_ids)
        
        metrics["precision"].append(precision)
        metrics["recall"].append(recall)
    
    return {k: sum(v)/len(v) for k, v in metrics.items()}

5. Iterate and Measure

Chunking is not a one-time decision:

Start with recursive chunking (good default)
Build eval set from real user queries
Measure retrieval quality
Identify failure patterns
Try alternative strategies
A/B test in production

Common Mistakes

Mistake 1: One-Size-Fits-All

Using 500-token fixed chunks for everything. Different content needs different strategies.

Mistake 2: Ignoring Overlap

Zero overlap means context can be split across chunks. Always use 10-20% overlap.

Mistake 3: Chunking Before Cleaning

Chunk clean text, not raw HTML/markdown with artifacts.

Mistake 4: No Metadata

Chunks without source/section metadata make debugging impossible.

Mistake 5: Not Testing

Evaluating chunking "by feel" instead of with metrics.

Conclusion

Chunking is where RAG systems are won or lost.

The rules:

Match strategy to document type
Preserve structure and context
Test with real queries
Iterate based on metrics

Get chunking right, and retrieval quality follows. Get it wrong, and nothing else matters.

The best RAG engineers spend 50% of their time on chunking. The worst spend 0% and wonder why retrieval fails.

What's your chunking strategy?

AM

Abhinav Mahajan

AI Product & Engineering Leader

Building AI systems that work in production. These frameworks come from real experience shipping enterprise AI products.

Continue Exploring

Writing

post

How to Debug a Failing RAG Pipeline

Your RAG system is returning bad answers. Here's a systematic approach to find out why and fix it.

essay

Why RAG is Harder Than It Looks

Retrieval-Augmented Generation seems simple in demos but breaks in a dozen ways in production. Here's why most RAG projects fail and what to do about it.

Case Studies

Case Study

Building an Enterprise AI Assistant with RAG

Exploring how to build a secure RAG-based assistant for enterprise knowledge retrieval with permissions and auditability.

💡 Apply This Framework

Find This Framework Useful?

I'd love to hear how you've applied it or discuss related ideas. Let's explore how these principles apply to your specific context.

Get in Touch Explore More Ideas