Building kb.miin.ai
White Paper: A Personal Knowledge Base Build
I built kb.miin.ai as a searchable, web-based knowledge base that turns a Google Workspace folder into a private retrieval system with semantic search, source references, OCR, spreadsheet parsing, and a clean public-facing interface. The project started as a local prototype, then moved into a production Next.js deployment on Vercel, with Supabase acting as the vector store and Gemini 2.5 Flash generating answers from retrieved excerpts. The big lesson: modern AI infrastructure is accessible if you break the build into small, testable pieces, and when in doubt, ask your favorite coding agent!
Tech Stack
The final system combines a static personal website, a dynamic knowledge-base app, Google Drive ingestion, Gemini models, and Supabase retrieval. Codex automated much of the implementation work: scaffolding the Next.js app, writing the Google Drive sync pipeline, switching from local Ollama to Gemini, shaping the Supabase schema, updating the UI, pushing code to GitHub, and helping debug deployment and authentication issues along the way.
Application
Next.js App Router, TypeScript, custom CSS, GitHub, and Vercel.
Models
Gemini 2.5 Flash for answers and Gemini Embedding for retrieval vectors.
Data Layer
Supabase Postgres with pgvector, document source tables, chunk tables, and RPC search.
Automation
Codex for code generation, refactors, Git operations, local testing, and deployment support.
What I Built
1. A Google-style search surface
The front end began as a minimal Google-like search page, then evolved into a branded containerized page that matches the visual language of miin.ai: dark mode, pill buttons, block-M logos, and a clean search field.
2. A retrieval-augmented answer flow
User questions are embedded, matched against stored document chunks in Supabase, and then passed to Gemini 2.5 Flash with only the retrieved excerpts. The response includes concise source references without exposing direct file links.
3. A local sync pipeline
The ingestion process runs locally against the Google Drive Knowledge Base folder. It reads Google Docs, PDFs, Word documents, Sheets, Excel files, plain text, and images. Image OCR and Excel parsing were added after the initial version so the index could cover more real workspace content.
4. A production deployment
The personal profile remains on GitHub Pages, while the knowledge base runs as a dynamic Next.js app on Vercel at kb.miin.ai. GoDaddy DNS maps the subdomain to Vercel, and Vercel reads Supabase and Gemini environment variables at runtime.
Tradeoffs
The first design considered Ollama with a local Gemma model. That was attractive because it kept inference local and avoided hosted model costs, but it created production complexity: a public site would need a persistent machine capable of running the model. Moving to Gemini 2.5 Flash reduced hosting burden, simplified Vercel deployment, and made the system easier to operate.
Supabase was selected as the vector store because it combines relational metadata, pgvector search, and simple server-side access. GitHub Pages was retained for the static profile site, but the knowledge base needed a dynamic runtime, so Vercel became the right fit. For Google auth, we avoided long-lived service-account JSON keys and used Application Default Credentials for local sync, which is safer but occasionally requires re-authentication.
Implementation Notes
These are the key command groups and setup steps behind the project. The exact values for secrets stay in local or hosted environment variables, not in the repository.
cd "/path/to/your/knowledge-base-app"
npm install
cp .env.example .env.local
gcloud auth application-default revoke
gcloud auth application-default login \
--client-id-file="$HOME/Downloads/knowledge-site-oauth-client.json" \
--scopes=https://www.googleapis.com/auth/drive.readonly,https://www.googleapis.com/auth/cloud-platform
# Run supabase/schema.sql in the Supabase SQL editor first.
npm run sync:drive
SYNC_ONLY_MISSING=1 npm run sync:drive
SYNC_ONLY_MISSING=1 SYNC_MAX_FILES=2 npm run sync:drive
npm run lint
npm run typecheck
npm run build
npm run dev
git add .
git commit -m "Build your-domain knowledge base app"
git push origin main
# Vercel production env vars:
# GEMINI_API_KEY
# SUPABASE_URL
# SUPABASE_SERVICE_ROLE_KEY
# GEMINI_CHAT_MODEL=gemini-2.5-flash
# GEMINI_EMBED_MODEL=gemini-embedding-001
# GEMINI_EMBED_DIMENSIONS=768
Content updates do not require a Vercel deploy. Once the app is live, syncing new files locally updates Supabase, and the production app reads the latest indexed content at query time.