Contents
PR Summary
What problems was I solving
Before this change, ordinary document uploads and edits in the Documents Service did not automatically flow into Brain's knowledge corpus updates; Brain corpora were only rebuilt through domain onboarding. Additionally, Brain RAG-Fusion retrieval was not filtered to a published corpus version, so chunks from superseded or incomplete corpus states could be returned.
What user-facing changes did I ship
Documents uploads and edits now automatically update Brain's default knowledge corpus. Brain-grounded retrieval respects the latest published corpus version, reducing stale or partial corpus results. During the transition, tenants without a published Brain corpus fall back to document retrieval so answers remain available.
How I implemented it
The PR wires Documents Service ingestion to best-effort trigger a new tenant-scoped documentCorpusUpdateBatchWorkflow after a document is marked ready. The batch workflow coalesces document-version signals, starts per-document claim extraction early while the batch is still open, reuses existing pending/processing/completed extractions to avoid duplicate model work, bounds batches by idle debounce/max duration/max count, and continues-as-new to limit history. Completed batch extractions feed the existing document-to-corpus pipeline, which now publishes complete corpus manifests by merging changed document sources with unchanged current corpus sources. Corpus publication now stages default_corpus_version_ids membership in Turbopuffer before flipping knowledge_corpora.current_version_id. Brain RAG-Fusion retrieval loads the latest published corpus version and filters keyword/semantic Turbopuffer searches to that version, falling back to Documents retrieval when no corpus has been published.
Description for the changelog
Implemented ARC-2356 so document uploads and edits automatically feed Brain's default knowledge corpus through a tenant-scoped, coalesced batch workflow with early, idempotent claim extraction. Corpus publishes now merge changed document sources into the current published manifest, stage Turbopuffer corpus-membership metadata before flipping the current version, and Brain RAG-Fusion retrieval is filtered to the latest published corpus version. A post-release source-chunk re-index is required for tenants with existing Brain corpora.