Enterprise RAG: What Breaks at Scale

Read the blog for long-form explanations (plain language, written like a serious article). Notebook export is the auto-generated HTML from Jupyter—useful for code cells. GitHub opens the .ipynb to run or edit locally.

Blog (~10k words): Part 1 · Part 2 · Part 3
1. Not all vectors are equal
Chunking (fixed, semantic, hierarchical) and embedding models—explained for students and production engineers. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
2. The right chunk, wrong context
Structural chunking, boundaries, and why “relevant” fragments still miss the exception clause. Start with Part 1; run the notebook for code.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
3. When dense search misses keywords
Hybrid BM25 + vectors, fusion (RRF), and logging which leg saved the query. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
4. Cross-encoder reranking
Retrieve wide, rerank narrow: quality vs latency vs GPU memory. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
5. ACL at query time
Enforce authorization in retrieval—filters, audits, cross-tenant tests. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
6. Semantic cache & invalidation
Similarity + version tags, TTL, false hits—cache without lying. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
7. Stale index & tombstones
Incremental updates, stable IDs, deletes, ingestion lag. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
8. Grounded or confidently wrong
Cheap faithfulness checks, abstention, and rubrics—before the LLM sounds sure. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
9. RAG is not one metric
Recall@k, slicing, golden sets—retrieval vs generation eval. Start with Part 1.
Notebook HTML export → Run on GitHub →
Blog (~10k words): Part 1 · Part 2 · Part 3
10. Prompt injection & PII
Untrusted chunks, layered defenses, PII boundaries. Start with Part 1.
Notebook HTML export → Run on GitHub →