Building kb.miin.ai

White Paper: A Personal Knowledge Base Build

Summary

I built kb.miin.ai as a searchable, web-based knowledge base that turns a Google Workspace folder into a private retrieval system with semantic search, source references, OCR, spreadsheet parsing, and a clean public-facing interface. The project started as a local prototype, then moved into a production Next.js deployment on Vercel, with Supabase acting as the vector store and Gemini 2.5 Flash generating answers from retrieved excerpts. The big lesson: modern AI infrastructure is accessible if you break the build into small, testable pieces, and when in doubt, ask your favorite coding agent!

Tech Stack

The final system combines a static personal website, a dynamic knowledge-base app, Google Drive ingestion, Gemini models, and Supabase retrieval. Codex automated much of the implementation work: scaffolding the Next.js app, writing the Google Drive sync pipeline, switching from local Ollama to Gemini, shaping the Supabase schema, updating the UI, pushing code to GitHub, and helping debug deployment and authentication issues along the way.

Source
Google Workspace Knowledge Base folder with PDFs, Docs, Sheets, images, and Excel files.
Ingest
Local Sync Script Parses files, extracts text, OCRs images, and chunks documents.
Embed
Gemini Embeddings Creates 768-dimension vectors for semantic matching.
Retrieve
Supabase pgvector Stores chunks and returns the closest matches for a user query.
Answer
Next.js + Gemini Vercel hosts the search UI and Gemini 2.5 Flash answers with references.

Application

Next.js App Router, TypeScript, custom CSS, GitHub, and Vercel.

Models

Gemini 2.5 Flash for answers and Gemini Embedding for retrieval vectors.

Data Layer

Supabase Postgres with pgvector, document source tables, chunk tables, and RPC search.

Automation

Codex for code generation, refactors, Git operations, local testing, and deployment support.

What I Built

1. A Google-style search surface

The front end began as a minimal Google-like search page, then evolved into a branded containerized page that matches the visual language of miin.ai: dark mode, pill buttons, block-M logos, and a clean search field.

2. A retrieval-augmented answer flow

User questions are embedded, matched against stored document chunks in Supabase, and then passed to Gemini 2.5 Flash with only the retrieved excerpts. The response includes concise source references without exposing direct file links.

3. A local sync pipeline

The ingestion process runs locally against the Google Drive Knowledge Base folder. It reads Google Docs, PDFs, Word documents, Sheets, Excel files, plain text, and images. Image OCR and Excel parsing were added after the initial version so the index could cover more real workspace content.

4. A production deployment

The personal profile remains on GitHub Pages, while the knowledge base runs as a dynamic Next.js app on Vercel at kb.miin.ai. GoDaddy DNS maps the subdomain to Vercel, and Vercel reads Supabase and Gemini environment variables at runtime.

Tradeoffs

The first design considered Ollama with a local Gemma model. That was attractive because it kept inference local and avoided hosted model costs, but it created production complexity: a public site would need a persistent machine capable of running the model. Moving to Gemini 2.5 Flash reduced hosting burden, simplified Vercel deployment, and made the system easier to operate.

Supabase was selected as the vector store because it combines relational metadata, pgvector search, and simple server-side access. GitHub Pages was retained for the static profile site, but the knowledge base needed a dynamic runtime, so Vercel became the right fit. For Google auth, we avoided long-lived service-account JSON keys and used Application Default Credentials for local sync, which is safer but occasionally requires re-authentication.

Implementation Notes

These are the key command groups and setup steps behind the project. The exact values for secrets stay in local or hosted environment variables, not in the repository.

Local app setup
cd "/path/to/your/knowledge-base-app"
npm install
cp .env.example .env.local
Google Drive authentication for local sync
gcloud auth application-default revoke
gcloud auth application-default login \
  --client-id-file="$HOME/Downloads/knowledge-site-oauth-client.json" \
  --scopes=https://www.googleapis.com/auth/drive.readonly,https://www.googleapis.com/auth/cloud-platform
Supabase schema and content sync
# Run supabase/schema.sql in the Supabase SQL editor first.

npm run sync:drive
SYNC_ONLY_MISSING=1 npm run sync:drive
SYNC_ONLY_MISSING=1 SYNC_MAX_FILES=2 npm run sync:drive
Local testing
npm run lint
npm run typecheck
npm run build
npm run dev
Deploy and link the sites
git add .
git commit -m "Build your-domain knowledge base app"
git push origin main

# Vercel production env vars:
# GEMINI_API_KEY
# SUPABASE_URL
# SUPABASE_SERVICE_ROLE_KEY
# GEMINI_CHAT_MODEL=gemini-2.5-flash
# GEMINI_EMBED_MODEL=gemini-embedding-001
# GEMINI_EMBED_DIMENSIONS=768

Content updates do not require a Vercel deploy. Once the app is live, syncing new files locally updates Supabase, and the production app reads the latest indexed content at query time.