Chunking: Splitting Documents for AI Processing

Written by Cobus van der Westhuizen, reviewed by Wynand van der Westhuizen, fact-checked by Lenata Oosthuizen. Last reviewed July 2026. Editorial policy.

What Is Chunking?

Chunking is the process of dividing a large document or body of text into smaller, discrete segments before those segments are processed by an AI model. Because language models have a finite token limit for what they can read in a single operation, content that exceeds that limit must be split before it can be indexed, embedded, or retrieved. The resulting segments are called chunks.

Chunking sits at the input end of most AI document pipelines. Once a document is split into chunks, each chunk is independently converted into an embedding, a numerical representation of its meaning. Those embeddings are stored in a vector index. When a user submits a query, the system retrieves the chunks whose embeddings are most similar to the query embedding, then passes those chunks to a language model to generate a response. This is the core mechanic of retrieval-augmented generation.

The way a document is chunked has a significant effect on retrieval quality. If chunks are too large, they cover too many topics at once, and the resulting embedding is a blurred average of many ideas rather than a precise representation of any single one. When the system tries to retrieve the most relevant chunk for a specific query, an oversized chunk may rank poorly because its embedding does not closely match any particular query. If chunks are too small, they lose context, and an individual sentence pulled out of a paragraph may be meaningless without the surrounding content.

Chunking strategy is therefore a genuine technical and editorial decision. Common approaches include fixed-size chunking, where every chunk contains the same number of tokens; sentence-level chunking, where natural sentence boundaries are respected; and semantic chunking, where a model identifies natural topic shifts and splits at those points. Many pipelines also implement chunk overlap, where adjacent chunks share a portion of their content to preserve context across boundaries.

Chunking In Practice

A Cape Town-based short-term insurance provider wanted to build an internal knowledge tool so that its consultants could query policy documents during client calls. The company's policy documents were long and complex, some running to forty or fifty pages with multiple sections covering different claim types, exclusions, and excess structures.

In an initial attempt, the documents were chunked at a fixed size of 1,000 tokens with no overlap. When consultants queried the tool about specific exclusions, the retrieved chunks frequently started or ended mid-clause, cutting off precisely the information needed. The tool's answers were incomplete and sometimes misleading because the relevant clause was split across two chunks, and only one of those chunks was retrieved.

The engineering team switched to a sentence-aware chunking approach with a target of 400 tokens per chunk and a 15 percent overlap. They also restructured the source documents so that each section had a clear heading followed by a self-contained explanation. After reindexing with the new chunks, retrieval accuracy improved and consultant satisfaction with the tool increased. The change in source document structure was as important as the change in chunk size, because well-organised source content produces better chunks.

For web content teams, this has a direct parallel. Pages that are structured with clear, focused headings and sections that each address a single idea chunk more cleanly than pages that mix multiple topics under a single heading. Structured, focused content is not just good for readers and traditional SEO. It is also good for how AI systems process and retrieve that content.

Chunking for comprehension

Chunking is the practice of breaking information into smaller, self-contained, digestible units, or chunks, so it is easier to understand, remember and use. It has roots in cognitive psychology, where chunking describes how people group information into meaningful units to handle the limits of working memory, and it applies directly to how content should be structured. On the web, where people scan rather than read exhaustively, chunking content, breaking it into clear sections under descriptive headings, using short paragraphs, lists and logical grouping, makes it far easier to consume than an undifferentiated wall of text. Each chunk addresses one idea, so a reader can grasp the structure, find what they want, and absorb points one at a time. This is a foundational principle of good writing and information design for the web: presenting information in well-organised chunks respects how people actually read and process content, improving comprehension, retention and usability, which is why chunking underlies readable, scannable, user-friendly content.

Chunking and AI retrieval

Chunking has taken on additional importance with AI search and retrieval, because of how AI systems process content. Retrieval-based AI, which grounds answers in fetched sources, works by breaking content into chunks, retrieving the chunks most relevant to a question, and using them to compose and cite an answer. This means that whether your content is quoted depends partly on how it is chunked: content organised into clear, self-contained segments that each answer a specific point fully is easier for these systems to retrieve and quote accurately than ideas scattered across a page or dependent on surrounding context. The reassuring point is that chunking well for AI overlaps entirely with chunking well for human readers: writing clear, self-contained sections under descriptive, question-shaped headings, each making sense on its own, serves both a scanning person and a retrieval system. So while the AI context has raised the profile of chunking, it does not require a new technique, but reinforces the same discipline of breaking content into coherent, self-contained units, now valued for machine extraction as well as human comprehension, which is why chunking has become part of the vocabulary of optimising content for AI search.

FAQ

Does chunking affect how well AI systems answer questions?

Yes. Poor chunking produces embeddings that mix multiple topics, making retrieval less precise. Well-defined chunks that each address a single coherent idea yield more accurate embeddings, which means the retrieval step returns more relevant content and the AI model produces better answers.

What chunk size works best for most use cases?

There is no universal optimal size. A common starting point is 300 to 500 tokens with a 10 to 20 percent overlap between adjacent chunks. The right size depends on document length, query type, and the embedding model used. Testing with representative queries is the most reliable way to tune chunk size.

What chunk size works best?

There is no single ideal size; the principle is that each chunk should be a coherent, self-contained unit that fully addresses one point and makes sense on its own, rather than a specific length. For both readers and AI, sections focused on a single idea under a clear heading work well, whether that is a short paragraph or a few paragraphs.

Chunking

What Is Chunking?

Chunking In Practice

Chunking for comprehension

Chunking and AI retrieval

FAQ

Want a team that knows these metrics cold?