Data pipeline + retrieval architecture
Document parsing, chunking strategy that respects semantic structure, embedding model selection, hybrid retrieval combining BM25 keyword search and dense vector search, reranking on the top-K, and a refresh policy that handles new and stale documents. The retrieval system is the single largest determinant of output quality in most RAG applications.
