Late last year, I shared the story of a personal obsession: building an AI system grounded in my podcast history. I had hundreds of hours of audio—conversations that had shaped my thinking—trapped in MP3 files. I wanted to set them free. I wanted to be able to ask my library questions, find half-remembered quotes, and synthesize ideas across years of listening.
So, I built a system. And like many “v1” engineering projects, it was a triumph of brute force.
It was a classic Retrieval-Augmented Generation (RAG) pipeline, hand-assembled from the open-source parts bin. I had a reliable tool called podgrab acting as my scout, faithfully downloading every new episode. But downstream from that was a complex RAG implementation to chop transcripts into bite-sized chunks. I had an embedding model to turn those chunks into vectors. And sitting at the center of it all was a vector database (ChromaDB) that I had to host, manage, and maintain.
It worked, but it was fragile. I didn’t even have a proper deployment setup; I ran the whole thing from a tmux session, with different panes for the ingestion watcher, the vector database, and the API server. It felt like keeping a delicate machine humming by hand. Every time I wanted to tweak the retrieval logic or—heaven forbid—change the embedding model, I was looking at a weekend of re-indexing and refactoring. I had built a memory for my podcasts, but I had also built myself a part-time job as a database administrator.
Then, a few weeks ago, I saw this announcement from the Gemini team.
They were launching File Search, a tool that promised to collapse my entire precarious stack into a single API call. The promise was bold: a fully managed RAG system. No vector DB to manage. No manual chunking strategies to debate. No embedding pipelines to debug. You just upload the files, and the model handles the rest.
I remember reading the documentation and feeling that specific, electric tingle that hits you when you realize the “hard problem” you’ve been solving is no longer a hard problem. It wasn’t just an update; it was permission to stop doing the busy work. I was genuinely excited—not just to write new code, but to tear down the old stuff.
Sometimes, it’s actually more fun to delete code than it is to write it.
The first step was the migration. I wrote a script to push my archive—over 18,000 podcast transcripts—into the new system. It took a while to run, but when it finished, everything was just… there. Searchable. Grounded. Ready.
That was the signal I needed. I opened my editor and started deleting code I had painstakingly written just last year. Podgrab stayed—it was doing its job perfectly—but everything else was on the chopping block.
- I deleted the
chromadbdependency and the local storage management. Gone. - I deleted the custom logic for sliding-window text chunking. Gone.
- I deleted the manual embedding generation code. Gone.
- I deleted the old web app and a dozen stagnant prototypes that were cluttering up the repo. Gone.
I watched my codebase shrink by hundreds of lines. The complexity didn’t just move; it evaporated. It was more than just a cleanup; it was a chance for a fresh start with new assumptions and fewer constraints. I wasn’t patching an old system anymore; I was building a new one, unconstrained by the decisions I made a year ago.
In its place, I wrote a new, elegant ingestion script. It does one thing: it takes the transcripts generated from the files podgrab downloads and uploads them to the Gemini File Search store. That’s it. Google handles the indexing, the storage, and the retrieval.
With the heavy lifting gone, I was free to rethink the application itself. I built a new central brain for the project, a lightweight service I call mcp_server.py (implementing the Model Context Protocol).
Previously, my server was bogged down with the mechanics of how to find data. Now, mcp_server.py simply hands a user’s query to my rag.py module. That module doesn’t need to be a database client anymore; it just configures the Gemini FileSearch tool and gets out of the way. The model itself, grounded by the tool, does the retrieval, the synthesis, and even the citation.
The difference is profound. The “RAG” part of my application—the part that used to consume 80% of my engineering effort—is now just a feature I use, like a spell checker or a date parser.
This shift is bigger than my podcast project. It changes the calculus for every new idea I have. Previously, if I wanted to build a grounded AI tool for a different context—say, for my project notes or my email archives—I would hesitate. I’d think about the boilerplate, the database setup, the chunking logic. Now? I can spin up a robust, grounded system in an hour.
My podcast agent is smarter now, faster, and much cheaper to run. But the best part? I’m not a database administrator anymore. I’m just a builder again.
You can try out the new system yourself at podcast-rag.hutchison.org or check out the code on GitHub.
[…] my Podcast RAG project, I wanted users to search across both podcast descriptions and episode transcripts. Different […]
[…] a simpler RAG experience. This is the same File Search capability I wrote about in November when I rebuilt my Podcast RAG system, but now it’s accessible from the terminal as part of my normal […]