Okidoki.chat: Enterprise AI Chat Widget
Sub-200ms RAG responses, voice mode, and video calls—shipped in 3 weeks

The Challenge
Most AI chat widgets are either too generic (can't answer brand-specific questions) or too slow (users wait 5+ seconds per response).
What enterprises actually need:
- Instant answers from their own content
- Voice and video for complex conversations
- Production-ready reliability
- Easy integration
The constraint: Build it in 3 weeks to prove the concept could ship fast.
What I Built
Okidoki.chat - Enterprise AI chat widget with RAG, voice, and video. Shipped November 2025.
Core capabilities:
- Sub-200ms responses from brand-specific content
- Voice mode powered by Gemini Live API
- Video calls for complex support scenarios
- Meeting transcription with automatic summaries
- Automated content ingestion from websites
- White-label deployment for enterprises
This isn't a demo—it's running in production on this website right now. Try it in the chat widget below.
Live: okidoki.chat↗
Technical Architecture
The hard part wasn't integrating AI models—it was making RAG fast enough for real-time chat.
Performance approach:
- Build-time processing - Heavy RAG computation happens when content changes, not on every query
- Intelligent caching - Redis stores processed embeddings and common query patterns
- Edge deployment - Vercel Edge Functions eliminate cold starts
- Parallel processing - Multiple AI providers (Groq for speed, GPT-4 for accuracy)
Stack:
- Next.js 14 + TypeScript - Full-stack application
- Gemini Live - Real-time voice conversations
- Groq - Ultra-fast LLM inference (sub-second)
- Daily.co - WebRTC video infrastructure
- AssemblyAI - Meeting transcription
- Redis - Vector caching layer
- Vercel Edge - Global deployment
The result: Response times that feel instant, not "AI slow."
Development Speed
Timeline breakdown:
- Week 1: Core RAG pipeline, basic chat UI
- Week 2: Voice mode integration, video calls
- Week 3: Content scraping, deployment, polish
How I moved fast:
- Leveraged existing AI APIs instead of building from scratch
- Used Next.js for rapid full-stack development
- Deployed on Vercel for zero infrastructure setup
- Focused on core value proposition first, polish second
Key insight: The fastest way to validate AI products is to ship them. Real user feedback beats internal testing every time.
Production Metrics
Current status:
- Production deployment with paying customers
- Used on this website (try it below!)
- Multiple enterprise trials in progress
- Zero downtime since launch
What this proves:
- Complex AI systems can ship fast
- RAG performance can match user expectations
- Multi-modal AI (text + voice + video) can work together seamlessly
What I Learned
On RAG performance: Most RAG implementations are slow because they process everything at query time. Moving computation to build time is the difference between 5-second responses and sub-200ms responses.
On AI product development: Ship fast, learn fast. I've seen too many AI projects die in "research mode." Okidoki went from idea to production in 3 weeks—that's the pace needed in 2024.
On technical tradeoffs: Using Groq for speed + GPT-4 for accuracy (parallel calls) costs more but delivers better UX. Performance is a feature.
Try It Yourself
This website uses Okidoki.chat. Open the chat widget (bottom right) to:
- Ask questions about my experience
- Test the voice mode
- See sub-200ms RAG responses in action
Building enterprise AI products? I can help you ship faster. Let's talk.