Enterprise RAG: What Breaks at Scale

Read the blog for long-form explanations (plain language, written like a serious article). Notebook export is the auto-generated HTML from Jupyter—useful for code cells. GitHub opens the .ipynb to run or edit locally.

  1. Blog (~10k words): Part 1 · Part 2 · Part 3
    1. Not all vectors are equal

    Chunking (fixed, semantic, hierarchical) and embedding models—explained for students and production engineers. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  2. Blog (~10k words): Part 1 · Part 2 · Part 3
    2. The right chunk, wrong context

    Structural chunking, boundaries, and why “relevant” fragments still miss the exception clause. Start with Part 1; run the notebook for code.

    Notebook HTML export → Run on GitHub →
  3. Blog (~10k words): Part 1 · Part 2 · Part 3
    3. When dense search misses keywords

    Hybrid BM25 + vectors, fusion (RRF), and logging which leg saved the query. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  4. Blog (~10k words): Part 1 · Part 2 · Part 3
    4. Cross-encoder reranking

    Retrieve wide, rerank narrow: quality vs latency vs GPU memory. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  5. Blog (~10k words): Part 1 · Part 2 · Part 3
    5. ACL at query time

    Enforce authorization in retrieval—filters, audits, cross-tenant tests. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  6. Blog (~10k words): Part 1 · Part 2 · Part 3
    6. Semantic cache & invalidation

    Similarity + version tags, TTL, false hits—cache without lying. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  7. Blog (~10k words): Part 1 · Part 2 · Part 3
    7. Stale index & tombstones

    Incremental updates, stable IDs, deletes, ingestion lag. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  8. Blog (~10k words): Part 1 · Part 2 · Part 3
    8. Grounded or confidently wrong

    Cheap faithfulness checks, abstention, and rubrics—before the LLM sounds sure. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  9. Blog (~10k words): Part 1 · Part 2 · Part 3
    9. RAG is not one metric

    Recall@k, slicing, golden sets—retrieval vs generation eval. Start with Part 1.

    Notebook HTML export → Run on GitHub →
  10. Blog (~10k words): Part 1 · Part 2 · Part 3
    10. Prompt injection & PII

    Untrusted chunks, layered defenses, PII boundaries. Start with Part 1.

    Notebook HTML export → Run on GitHub →