Back to all work
Case Study
November 15, 2024

Building an Enterprise AI Assistant with RAG

Exploring how to build a secure RAG-based assistant for enterprise knowledge retrieval with permissions and auditability.

Role
Timeline
Tags
RAGEnterprise AISecurityKnowledge Systems

Overview

This project explores building an enterprise AI assistant using RAG (Retrieval-Augmented Generation) architecture. The goal is to understand how to:

  1. Connect to internal knowledge sources securely
  2. Retrieve relevant context from multiple systems
  3. Respect existing access controls (permission-aware retrieval)
  4. Provide audit trails for enterprise compliance

Key Challenges

Building enterprise AI systems requires thinking beyond the demo:

  • How do you handle permissions when users query across systems?
  • What's the right chunking strategy for different document types?
  • How do you make retrieval fast enough for real-time interaction?
  • How do you build eval sets to measure quality?

Architecture Highlights

  • Connector Gateway: Unified interface to SharePoint, Confluence, internal APIs
  • Embedding Pipeline: Automated chunking, vectorization, and indexing with metadata preservation
  • Retrieval Engine: Hybrid search (semantic + keyword) with permission filtering
  • LLM Integration: AWS Bedrock with guardrails and response validation
  • Audit Layer: Full traceability of queries, sources, and responses

Technical Stack

  • Backend: Python (FastAPI), AWS Lambda for connectors
  • Vector Store: Pinecone with metadata filtering
  • LLM: Claude (via AWS Bedrock) with prompt engineering for grounded responses
  • Auth: SSO integration with role-based access control (RBAC)
  • UI: React-based chat interface with citation links

What I Learned

1. Permissions are non-negotiable The assistant had to respect existing access controls from day one. We implemented permission passthrough—every document retrieved was filtered by the user's actual permissions in source systems.

2. Chunking strategy matters Naive chunking (fixed 512 tokens) produced terrible results. We switched to semantic chunking that preserved context boundaries (sections, paragraphs) and saw a 30% improvement in retrieval relevance.

3. Citations build trust Users needed to verify AI responses. We added inline citations linking back to source documents. This single feature drove adoption more than any other.

4. Eval-driven iteration We built an eval harness with 200+ question-answer pairs curated from real user queries. Every architecture change had to improve eval scores—this prevented "vibes-based" optimization.

Outcome

The assistant became the primary interface for internal knowledge discovery. Teams use it for onboarding, policy lookups, technical troubleshooting, and cross-team knowledge sharing. The platform is now expanding to support additional use cases like contract analysis and incident response workflows.

🚀 Let's Build Together

Interested in Similar Work?

I'm available for consulting, technical advisory, and collaborative projects. Let's discuss how I can help with your AI initiatives.