Gemini Embedding 2 MCP Server

The Local Context Problem

AI agents are still bad at one practical thing developers need every day: working safely against a local corpus of code and documents without pushing that entire corpus into a hosted third-party system.

I built this MCP server to make local retrieval practical. The goal was straightforward:

keep the indexed corpus on the local machine
support more than plain text
give AI tools a stable retrieval interface

Architecture

The server runs locally as a Python process and binds Gemini's embedding model to a local ChromaDB instance at ~/.gemini_mcp_db. That means the vector store stays on disk under the developer's control instead of becoming another hosted dependency.

The interesting part is the ingestion path. The system does not just handle text files. It can process images, audio, video, and PDFs, using Gemini's multimodal capabilities to preserve more of the original signal than a naive text-only extraction pipeline would.

The Real Engineering Problem

Local RAG sounds easy until an agent starts re-indexing the same directories over and over. That is where cost, latency, and quota exhaustion show up.

To control that, I added:

wildcard blacklisting for irrelevant directories
MD5-based deduplication for unchanged files
a local-first storage model to keep indexing predictable

This project matters to me because it treats AI tooling as infrastructure. The win is not a flashy demo. The win is giving local agents a retrieval layer that is private, practical, and hard to accidentally abuse.

Gemini Embedding 2 MCP Server_

The Local Context Problem

Architecture

The Real Engineering Problem

Related Work

AuthGate AI Proxy

Twin Leopards Society

SAP Signavio