What Is Chunking?

Chunking is the process of dividing a large document or body of text into smaller, discrete segments before those segments are processed by an AI model. Because language models have a finite token limit for what they can read in a single operation, content that exceeds that limit must be split before it can be indexed, embedded, or retrieved. The resulting segments are called chunks.

Chunking sits at the input end of most AI document pipelines. Once a document is split into chunks, each chunk is independently converted into an embedding, a numerical representation of its meaning. Those embeddings are stored in a vector index. When a user submits a query, the system retrieves the chunks whose embeddings are most similar to the query embedding, then passes those chunks to a language model to generate a response. This is the core mechanic of retrieval-augmented generation.

The way a document is chunked has a significant effect on retrieval quality. If chunks are too large, they cover too many topics at once, and the resulting embedding is a blurred average of many ideas rather than a precise representation of any single one. When the system tries to retrieve the most relevant chunk for a specific query, an oversized chunk may rank poorly because its embedding does not closely match any particular query. If chunks are too small, they lose context, and an individual sentence pulled out of a paragraph may be meaningless without the surrounding content.

Chunking strategy is therefore a genuine technical and editorial decision. Common approaches include fixed-size chunking, where every chunk contains the same number of tokens; sentence-level chunking, where natural sentence boundaries are respected; and semantic chunking, where a model identifies natural topic shifts and splits at those points. Many pipelines also implement chunk overlap, where adjacent chunks share a portion of their content to preserve context across boundaries.

Chunking In Practice

A Cape Town-based short-term insurance provider wanted to build an internal knowledge tool so that its consultants could query policy documents during client calls. The company's policy documents were long and complex, some running to forty or fifty pages with multiple sections covering different claim types, exclusions, and excess structures.

In an initial attempt, the documents were chunked at a fixed size of 1,000 tokens with no overlap. When consultants queried the tool about specific exclusions, the retrieved chunks frequently started or ended mid-clause, cutting off precisely the information needed. The tool's answers were incomplete and sometimes misleading because the relevant clause was split across two chunks, and only one of those chunks was retrieved.

The engineering team switched to a sentence-aware chunking approach with a target of 400 tokens per chunk and a 15 percent overlap. They also restructured the source documents so that each section had a clear heading followed by a self-contained explanation. After reindexing with the new chunks, retrieval accuracy improved and consultant satisfaction with the tool increased. The change in source document structure was as important as the change in chunk size, because well-organised source content produces better chunks.

For web content teams, this has a direct parallel. Pages that are structured with clear, focused headings and sections that each address a single idea chunk more cleanly than pages that mix multiple topics under a single heading. Structured, focused content is not just good for readers and traditional SEO. It is also good for how AI systems process and retrieve that content.

FAQ

Does chunking affect how well AI systems answer questions?

Yes. Poor chunking produces embeddings that mix multiple topics, making retrieval less precise. Well-defined chunks that each address a single coherent idea yield more accurate embeddings, which means the retrieval step returns more relevant content and the AI model produces better answers.

What chunk size works best for most use cases?

There is no universal optimal size. A common starting point is 300 to 500 tokens with a 10 to 20 percent overlap between adjacent chunks. The right size depends on document length, query type, and the embedding model used. Testing with representative queries is the most reliable way to tune chunk size.

Want a team that knows these metrics cold?

Founder-led digital marketing for South African businesses since 2015. 4.9-star rated, 64+ clients, no long-term contracts.