almessadi.
Back to Projects

Project Case Study

Gemini Embedding 2 MCP Server_

A local MCP server for multimodal retrieval, built to give AI tools private search over code, documents, images, audio, and video without shipping data to a hosted vector store.

The Local Context Problem

AI agents are still bad at one practical thing developers need every day: working safely against a local corpus of code and documents without pushing that entire corpus into a hosted third-party system.

I built this MCP server to make local retrieval practical. The goal was straightforward:

  • keep the indexed corpus on the local machine
  • support more than plain text
  • give AI tools a stable retrieval interface

Architecture

The server runs locally as a Python process and binds Gemini's embedding model to a local ChromaDB instance at ~/.gemini_mcp_db. That means the vector store stays on disk under the developer's control instead of becoming another hosted dependency.

The interesting part is the ingestion path. The system does not just handle text files. It can process images, audio, video, and PDFs, using Gemini's multimodal capabilities to preserve more of the original signal than a naive text-only extraction pipeline would.

The Real Engineering Problem

Local RAG sounds easy until an agent starts re-indexing the same directories over and over. That is where cost, latency, and quota exhaustion show up.

To control that, I added:

  • wildcard blacklisting for irrelevant directories
  • MD5-based deduplication for unchanged files
  • a local-first storage model to keep indexing predictable

This project matters to me because it treats AI tooling as infrastructure. The win is not a flashy demo. The win is giving local agents a retrieval layer that is private, practical, and hard to accidentally abuse.