Enterprise RAG Architecture: Patterns for Permission-Aware Knowledge Systems

Overview

This architecture study documents the patterns and decisions required for production-grade enterprise RAG systems. The key challenges addressed:

Connect to internal knowledge sources securely
Retrieve relevant context from multiple systems
Respect existing access controls (permission-aware retrieval)
Provide audit trails for enterprise compliance

Key Challenges

Enterprise AI systems require architectural patterns that go beyond prototypes:

Handling permissions when users query across distributed systems
Defining chunking strategies that preserve semantic context
Achieving real-time retrieval performance at enterprise scale
Establishing eval frameworks to measure quality systematically

Architecture Highlights

Connector Gateway: Unified interface to SharePoint, Confluence, internal APIs
Embedding Pipeline: Automated chunking, vectorization, and indexing with metadata preservation
Retrieval Engine: Hybrid search (semantic + keyword) with permission filtering
LLM Integration: AWS Bedrock with guardrails and response validation
Audit Layer: Full traceability of queries, sources, and responses

Technical Stack

Backend: Python (FastAPI), AWS Lambda for connectors
Vector Store: Pinecone with metadata filtering
LLM: Claude (via AWS Bedrock) with prompt engineering for grounded responses
Auth: SSO integration with role-based access control (RBAC)
UI: React-based chat interface with citation links

Patterns Established

1. Permission Passthrough Pattern This work established the permission passthrough pattern now foundational to enterprise RAG systems. Every document retrieved is filtered by the user's actual permissions in source systems, ensuring zero data leakage.

2. Semantic Chunking Strategy Defined a context-aware chunking approach that preserves semantic boundaries (sections, paragraphs) rather than arbitrary token limits. This pattern improved retrieval relevance by 30% and has been adopted across our RAG implementations.

3. Citations build trust Users needed to verify AI responses. We added inline citations linking back to source documents. This single feature drove adoption more than any other.

4. Eval-driven iteration We built an eval harness with 200+ question-answer pairs curated from real user queries. Every architecture change had to improve eval scores—this prevented "vibes-based" optimization.

Strategic Insights

This work demonstrates that enterprise RAG is fundamentally a permissions problem, not just a retrieval problem. The key insight: any RAG system deployed in an enterprise must treat permissions as a first-class architectural concern from day one.

Three Architectural Principles Established:

Security-First Architecture: Permissions cannot be retrofitted. Permission-aware retrieval must be designed into the connector layer, not added as a filter on top.
Semantic Chunking Over Token Limits: Context preservation matters more than arbitrary limits. Chunking strategies should respect document structure and semantic boundaries.
Eval-Driven Development: Production RAG requires systematic measurement. Build eval harnesses before you build features, not after.

These principles now inform how our organization approaches all AI system development.

Impact & Adoption

The patterns established in this work have been adopted across multiple teams:

Permission Passthrough Pattern: Now the standard approach for all enterprise AI systems requiring data access. Three other teams building RAG systems adopted this pattern directly.
Semantic Chunking: Our chunking library has been extracted and is now used across 5+ internal projects, becoming the de facto standard for document processing.
Eval Framework: The eval harness approach influenced how the organization builds AI systems. Teams now start with evals, not prototypes.

External Recognition: Presented these patterns at an internal architecture review, which led to the approach being documented in the organization's AI development guidelines.

Outcome

The assistant became the primary interface for internal knowledge discovery. Teams use it for onboarding, policy lookups, technical troubleshooting, and cross-team knowledge sharing. The platform is now expanding to support additional use cases like contract analysis and incident response workflows.

Impact Lens

Enterprise problem: RAG systems in enterprises often fail when permissions, retrieval quality, and traceability are treated as afterthoughts.

Audience: Enterprise platform teams, security/governance stakeholders, and delivery teams implementing internal knowledge assistants.

Pattern demonstrated: Permission passthrough, source-grounded responses, semantic chunking, retrieval evaluation, and auditable query/response flows.

Why it matters: Reduces data exposure risk, lowers low-trust responses, and improves repeatability when scaling RAG beyond pilot demos.

Proof level: Public framework, production-experience-informed.