GitHub issues transforming into glowing skill cards floating above a laptop screen.

Bundled Skills in Gemini Scribe

The feature that became Bundled Skills started with a GitHub issues page.

I wrote and maintain Gemini Scribe, an Obsidian plugin that puts a Gemini-powered agent inside your vault. Thousands of people use it, and they have questions. People would open discussions and issues asking how to configure completions, how to set up projects, what settings were available. I was answering the same questions over and over, and it hit me: the agent itself should be able to answer these. It has access to the vault. It can read files. Why am I the bottleneck for questions about my own plugin?

So I built a skill. I took the same documentation source that powers the plugin’s website, packaged it up as a set of instructions the agent could load on demand, and suddenly users could just ask the agent directly. “How do I set up completions?” “What settings are available?” The agent would pull in the right slice of documentation and give a grounded answer. The docs on the web and the docs the agent reads are built from the same source. There is no separate knowledge base to keep in sync.

That first skill opened a door. I was already using custom skills in my own vault to improve how the agent worked with Bases and frontmatter properties. Once I had the bundled skills mechanism in place, I started looking at those personal skills differently. The ones I had built for myself around Obsidian-specific tasks were not just useful to me. They would be useful to anyone running Gemini Scribe. So I started migrating them from my vault into the plugin as built-in skills.

With the latest version of Gemini Scribe, the plugin now ships with four built-in skills. In a future post I will walk through how to create your own custom skills, but first I want to explain what ships out of the box and why this approach works.

Four Skills Out of the Box

That first skill became gemini-scribe-help, and it is still the one I am most proud of conceptually. The plugin’s own documentation lives inside the same skill system as everything else. No special case, no separate knowledge base. The agent answers questions about itself using the same mechanism it uses for any other task.

The second skill I built was obsidian-bases. I wanted the agent to be good at creating Bases (Obsidian’s take on structured data views), but it kept getting the configuration wrong. Filters, formulas, views, grouping: there is a lot of surface area and the syntax is particular. So I wrote a skill that guides the agent through creating and configuring Bases from scratch, including common patterns like task trackers and project dashboards. Instead of me correcting the agent’s output every time, I describe what I want and the agent builds it right the first time.

Next came audio-transcription. This one has a fun backstory. Audio transcription was one of the oldest outstanding bugs in the repo. People wanted to use it with Obsidian’s native audio recording, but the results were poor. In this release, fixes around binary file uploads meant the model could finally receive audio files properly. Once that was working, I realized I did not need to write any more code to get good transcriptions. I just needed to give the agent good instructions. The skill guides it through producing structured notes with timestamps, speaker labels, and summaries. It turns a messy audio file into a clean, searchable note, and the fix was not code but context.

The fourth is obsidian-properties. Working with note properties (the YAML frontmatter at the top of every Obsidian note) sounds trivial until you are doing it across hundreds of notes. The agent would make inconsistent choices about property types, forget to use existing property names, or create duplicates. This skill makes it reliable at creating, editing, and querying properties consistently, which matters enormously if you are using Obsidian as a serious knowledge management system.

The pattern behind all four is the same. I watched the agent struggle with something specific to Obsidian, and instead of accepting that as a limitation of the model, I wrote a skill to fix it.

Why Not Just Use the System Prompt

You might be wondering why I did not just shove all of this into the system prompt. I wrote about this problem in detail in Managing the Agent’s Attention, but the short version is that system prompts are a “just-in-case” strategy. You load up the agent with everything it might need at the start of the conversation, and as you add more instructions, they start competing with each other for the model’s attention. Researchers call this the “Lost in the Middle” problem: models pay disproportionate attention to the beginning and end of their context, and everything in between gets diluted. If I packed all four skills worth of instructions into the system prompt, each one would make the others less effective. Every new skill I add would degrade the ones already there.

Skills avoid this entirely. The agent always knows which skills are available (it gets a short name and description for each one), but only loads the full instructions when it actually needs them. When a skill activates, its instructions land in the most recent part of the conversation, right before the model starts reasoning. Only one skill’s instructions are competing for attention at a time, and they are sitting in the highest-attention position in the context window.

There is a second benefit that surprised me. Because skills activate through the activate_skill tool call, you can watch the agent load them. In the agent session, you see exactly when a skill is activated and which one it chose. This gives you something that system prompts never do: observability. If the agent is not following your instructions, you can check whether it actually activated the skill. If it activated the skill but still got something wrong, you know the problem is in the skill’s instructions, not in the agent’s attention. That feedback loop is what lets you iterate and improve your skills over time. You are no longer guessing whether the agent read your instructions. You can see it happen.

Skills follow the open agentskills.io specification, and this matters more than it might seem. We have seen significant standardization around this spec across the industry in 2026. That means skills are portable. If you have been using skills with another agent, you can bring them into Gemini Scribe and they will work. If you build skills in Gemini Scribe, you can take them with you. They are not a proprietary format tied to one tool. They are Markdown files with a bit of YAML frontmatter, designed to be human-readable, version-controllable, and portable across any agent that supports the spec.

What Comes Next

The four built-in skills are just the beginning. When I decide what to build next, I think about skills in four categories. First, there are skills that give the agent domain knowledge about Obsidian itself, things like Bases and properties where the model’s general training is not specific enough. Second, there are skills that help the agent use Gemini Scribe’s own tools effectively. The plugin has capabilities like deep research, image generation, semantic search, and session recall, and each of those benefits from a skill that teaches the agent when and how to use them well. Third, there are skills that bring entirely new capabilities to the agent, like audio transcription. And fourth, there is user support: the help skill that started this whole process, making sure people can get answers without leaving their vault.

The next version of Gemini Scribe will add built-in skills for semantic search, deep research, image generation, and session recall. The skills system is also designed to be extended by users. In a future post I will walk through creating your own custom skills, both by hand and by asking the agent to build them for you.

For now, the takeaway is simple. A general-purpose model knows a lot, but it does not know your tools. When I watched the agent struggle with Obsidian Bases or produce flat transcripts or make a mess of note properties, I could have accepted those as limitations. Instead, I wrote skills to close the gap. The model’s knowledge is broad. Skills make it deep.

A bird's-eye view of a winding river of glowing green GitHub contribution tiles flowing across a dark landscape, with bright yellow-green flames rising from clusters of the brightest tiles, while a lone figure sits at a laptop at the edge of the mosaic under a distant skyline of code-filled windows.

4255 Contributions – A Year of Building in the Open

I was staring at my GitHub profile the other day when a number caught my eye. 4,255. That’s how many contributions GitHub has recorded for me over the past year. I sat with it for a moment, doing the quick mental math: that’s close to twelve contributions every single day, weekends included. The shape of the year looked just as striking. I showed up on 332 of the 366 days in the window, 91% of them, and at one point put together a 113-day streak without a gap. It felt like a lot. It felt like proof of something I hadn’t been able to articulate until I saw it rendered as a green heatmap on a screen.

About a year ago, I wrote about my decision to move back to individual contributor work after years in leadership roles. I talked about missing the flow state, the direct feedback loop of writing code and watching it work. What I didn’t know at the time was just how dramatically that shift would show up in the data. 4,255 contributions is the quantitative answer to the question I was trying to answer qualitatively in that post: what happens when you give a builder back the time to build?

The Shape of a Year

Numbers by themselves are just numbers. What makes them interesting is the shape they take when you zoom in. My year wasn’t a single monolithic effort on one project. It was a constellation of interconnected work, each project feeding into the next, each one teaching me something that made the others better.

The largest body of work was on Gemini CLI, Google’s open-source AI agent for the terminal. This project alone accounts for a significant chunk of those contributions, spanning everything from core feature development to building the Policy Engine that governs how the agent interacts with your system. But the contributions weren’t just code. A huge portion of my time went into code reviews, issue triage, and community engagement. Working on a repository with over 100,000 stars means that every merged PR has real impact, and every review is a conversation with developers around the world.

Then there was Gemini Scribe, my Obsidian plugin that started as a weekend experiment and grew into a tool with 302 stars and a community of writers who depend on it. Over the past year, I shipped a major 3.0 release, built agent mode, and iterated constantly on the rewrite features that make it useful for daily writing. In fact, this very blog post was drafted in the tool I built, which is a strange and satisfying loop.

Alongside these larger efforts, I shipped a handful of small, sharp tools that I needed for my own workflows. The GitHub Activity Reporter is one I’ve written about before, a utility that uses AI to transform raw GitHub data into narrative summaries for performance reviews and personal reflection. More recently, I built the Workspace extension for Gemini CLI and a deep research extension that lets you conduct multi-step research from the terminal. Each of these tools was born from a specific itch, and each turned out to be useful to more people than I expected. The Workspace extension alone has gathered 510 stars.

The Rhythm of Building

One thing the contribution graph doesn’t capture is the rhythm behind the numbers. My weeks developed a cadence over the year that I didn’t plan but that emerged naturally. Mornings were for deep work on Gemini CLI, the kind of focused system design and implementation that benefits from a fresh mind. Afternoons were for reviews and community work, responding to issues, providing feedback on PRs, and engaging with the developers building on top of our tools. Evenings and weekends were where the personal projects lived: Gemini Scribe, the extensions, and whatever new idea was rattling around in my head.

This rhythm is something I couldn’t have had in my previous role. When your calendar is stacked with meetings from nine to five, the creative work gets squeezed into the margins. Now, the creative work is the whole page. That’s the real story behind 4,255 contributions. It’s not about productivity metrics or GitHub gamification. It’s about what happens when you align your time with the work that energizes you.

What Surprised Me

A few things caught me off guard when I looked back at the year.

First, the ratio of code to “everything else” wasn’t what I expected. I assumed the majority of my contributions would be commits. In reality, a massive portion was reviews, comments, and issue management. On Gemini CLI alone I logged 205 reviews over the year. This was especially true as my role on that project evolved from pure contributor to something closer to a technical steward. Reviewing a complex PR, asking the right questions, and helping someone refine their approach takes just as much skill as writing the code yourself. Sometimes more.

Second, the personal projects had more reach than I anticipated. When I wrote about building personal software, I was mostly thinking about tools I built for myself. But Gemini Scribe has real users who file real bugs and request real features. The Workspace extension took off because it solved a problem that a lot of Gemini CLI users were hitting. Building in the open means you discover an audience you didn’t know was there.

Third, and this is the one I keep coming back to, the year felt shorter than 4,255 contributions would suggest. Flow state compresses time. When you’re deep in a problem, hours feel like minutes. I remember entire weekends spent in the codebase that felt like an afternoon. That compression is, for me, the clearest signal that I made the right call in going back to IC work.

Fourth, and this is the one I never would have predicted until I charted it out: the weekend, not the weekday, turned out to be my most productive window by a wide margin. Saturdays averaged 14.7 contributions, Sundays 14.5, and Thursday, the day I’d have guessed was safest, came in last at 8.3. The busiest single day of the entire year was a Saturday, December 20, when I shipped 89 contributions into podcast-rag, rebuilding the web upload flow, adding episode management to the admin dashboard, and migrating email delivery over to Resend, all in one afternoon. I didn’t plan for the weekends to become the engine. They just did, because that’s where the personal projects live, and the personal projects are where the work is loudest, most direct, and most free of interruption. A day with no meetings on it, I’ve come to realize, is worth more than I ever gave it credit for.

Looking Forward

I don’t know what next year’s number will be, and I’m not particularly interested in making it bigger. The number is a side effect, not a goal. What I care about is continuing to work on problems that matter, in the open, with people who push me to think more clearly. The AI-first developer model I wrote about over a year ago is now just how I work every day. The agents I’m building are the collaborators I’m building with, and both keep getting better.

If you’re someone who’s been thinking about a similar shift, whether it’s moving back to IC work, contributing to open source, or just carving out more time for the work that lights you up, I’d encourage you to try it. You might be surprised by what a year of focused building can produce. I certainly was.

A focused workspace at a desk in a vast library, with nearby shelves illuminated and distant shelves visible but softened, a pair of sunglasses resting on the desk

Scoping AI Context with Projects in Gemini Scribe

My son has a friend who likes to say, “born to dilly dally, forced to lock in.” I’ve started to think that describes AI agents in a large Obsidian vault perfectly.

My vault is a massive, sprawling entity. It holds nearly two decades of thoughts, ranging from deep dives into LLM architecture to my kids’ school syllabi and the exact dimensions needed for an upcoming home remodelling project. When I first introduced Gemini Scribe, the agent’s ability to explore all of that was a feature. I could ask it to surface surprising connections across topics, and it would. But as I’ve leaned harder into Scribe as a daily partner, both at home and at work, the dilly dallying became a real problem. My work vault has thousands of files with highly overlapping topics. It’s not a surprise that the agent might jump from one topic to another, or get confused about what we’re working on at any given time. When I asked the agent to help me structure a paragraph about agentic workflows, I didn’t want it pulling in notes from my jazz guitar practice.

I could have created a new, isolated vault just for my blog writing. I tried that briefly, but I immediately found myself copying data back and forth. I was duplicating Readwise syncs, moving research papers, and fracturing my knowledge base. That wasn’t efficient, and it certainly wasn’t fun. The problem wasn’t that the agent could see too much. The problem was glare. I needed sunglasses, not blinders. I needed to force the agent to lock in.

So, I built Projects in Gemini Scribe.

A project defines scope without acting as a gatekeeper

Fundamentally, a project in Gemini Scribe is a way to focus the agent’s attention without locking it out of anything. It defines a primary area of work, but the rest of the vault is still there. Think of it like sitting at a desk in the engineering section of a library. Those are the shelves you browse by default, the ones within arm’s reach. But if you know the call number for a book in the history section, nobody stops you from walking over and grabbing it. You can even leave a stack of books from other sections on your desk ahead of time if you know you’ll need them. If you’ve followed along with the evolution of Scribe from plugin to platform, you’ll recognize this as a natural extension of the agent’s growing capabilities.

The core mechanism is remarkably simple. Any Markdown file in your vault can become a project by adding a specific tag to its YAML frontmatter.

---
tags:
  - gemini-scribe/project
name: Letters From Silicon Valley
skills:
  - writing-coach
permissions:
  delete_file: deny
---

Once tagged, that file’s parent directory becomes the project root. From that point on, when an agent session is linked to the project, its discovery tools are automatically scoped to that directory and its subfolders. Under the hood, the plugin intercepts API calls to tools like list_files and find_files_by_content, transparently prepending the project root to the search paths. The practical difference is immediate. Before projects, I could be working on a blog post about agent memory systems and the agent would surface notes from a completely unrelated project that happened to use similar terminology. Now I can load up a project and work with the agent hand in hand, confident it won’t get distracted by similar ideas or overlapping vocabulary from other corners of the vault.

The project file serves as both configuration and context

The project file itself serves a dual purpose. It acts as both configuration and context. The frontmatter handles the configuration, allowing me to explicitly limit which skills the agent can use or override global permission settings. For example, denying file deletions for a critical writing project is a simple but effective safety net. But the real power is in customizing the agent’s behavior per project. For my creative writing, I actually don’t want the agent to write at all. I want it to read, critique, and discuss, but the words on the page need to be mine. Projects let me turn off the writing skill entirely for that context while leaving it fully enabled for my blog work. The same agent, shaped differently depending on what I’m working on.

Everything below the frontmatter is treated as context. Whatever I write in the body of the project note is injected directly into the agent’s system prompt, acting much like an additional, localized set of instructions. The global agent instructions are still respected, but the project instructions provide the specific context needed for that particular workspace. This is similar in spirit to how I’ve previously discussed treating prompts as code, where the instructions you give an agent deserve the same rigor and iteration as any other piece of software.

This is where the sunglasses metaphor really holds. The agent’s discovery tools, things like list_files and find_files_by_content, are scoped to the project folder. That’s the glare reduction. But the agent’s ability to read files is completely unrestricted. If I am working on a technical post and need to reference a specific architectural note stored in my main Notes folder, I have two options. I can ask the agent to go grab it, or I can add a wikilink or embed to the project file’s body and the agent will have it available from the start. One is like walking to the history section yourself. The other is like leaving that book on your desk before you sit down. Either way, the knowledge is accessible. The project just keeps the agent from rummaging through every shelf on its own. This builds directly on the concepts of agent attention I explored in Managing AI Agent Attention.

Session continuity keeps the agent focused across your vault

One of the more powerful aspects of this system is how it interacts with session memory. When I start a new chat, Gemini Scribe looks at the active file. If that file lives within a project folder, the session is automatically linked to that project. This is a direct benefit of the supercharged chat history work that landed earlier in the plugin’s life.

This linkage is stable for the lifetime of the session. I can navigate around my vault, opening files completely unrelated to the project, and the agent will remain focused on the project’s context and instructions. This means I don’t have to constantly remind the agent of the rules of the road. The project configuration persists across the entire conversation.

Furthermore, session recall allows the agent to look back at past conversations. When I ask about prior work or decisions related to a specific project, the agent can search its history, utilizing the project linkage to find the most relevant past interactions. This creates a persistent working environment that feels much more like a collaboration than a simple transaction.

Structuring projects effectively requires a few simple practices

To get the most out of projects, I’ve found a few practices to be particularly effective.

First, lean into the folder-based structure. Place the project file at the root of the folder containing the relevant work. Everything underneath it is automatically in scope. This feels natural if you already organize your vault by topic or project, which many Obsidian users do.

Second, start from the defaults and adjust as the project demands. Out of the box, a new project inherits the agent’s standard skills and permissions, which is a sensible baseline for most work. From there, you tune. If you find the agent reaching for tools that don’t make sense in a given context, narrow the allowed skills in the frontmatter. If a project needs extra safety, tighten the permissions. The creative writing example I mentioned earlier came about exactly this way. I started with the defaults, realized I wanted the agent as a reader and critic rather than a co-writer, and adjusted accordingly. This aligns with the broader principle I’ve written about when discussing building responsible agents: the right guardrails are the ones shaped by the actual work.

Finally, treat the project body as a living document. As the project evolves, update the instructions and external links to ensure the agent always has the most current and relevant context. It’s a simple mechanism, but it fundamentally changes how I interact with an AI embedded in a large knowledge base. It allows me to keep my single, massive vault intact, while giving the agent the precise focus it needs to be genuinely helpful.

A beam of white light enters a translucent geometric crystal and refracts into three distinct colored beams — red, green, and blue — each passing through a different abstract geometric shape against a dark navy background.

MCP Isn’t Dead You Just Aren’t the Target Audience

I was debugging a connection issue between Gemini Scribe and the Google Calendar integration in my Workspace MCP server last month when a friend sent me a link. “Have you seen this? MCP is dead apparently.” It was Eric Holmes’ post, MCP is dead. Long live the CLI, which had just hit the top of Hacker News. I read it while waiting for a server restart, which felt appropriate.

His argument is clean and persuasive: CLI tools are simpler, more reliable, and battle-tested. LLMs are trained on millions of man pages and Stack Overflow answers, so they already know how to use gh and kubectl and aws. MCP introduces flaky server processes, opinionated authentication, and an all-or-nothing permissions model. His conclusion is that companies should ship a good API, then a good CLI, and skip MCP entirely.

I agree with about half of that. And the half I agree with is the part that doesn’t matter.

The Shell is a Privilege

Holmes is writing from the perspective of a developer sitting in a terminal. From that vantage point, everything he says is correct. If your agent is Claude Code or Gemini CLI, running in a shell session on your laptop with your credentials loaded, then yes, gh pr view is faster and more capable than any MCP wrapper around the GitHub API. I made exactly this observation in my own post on the Internet of Agents. Simon Willison said as much in his year-end review, noting that for coding agents, “the best possible tool for any situation is Bash.”

But here’s the thing: not every agent has a shell. And not every agent is an interactive coding assistant.

I wrote in Everything Becomes an Agent that the agentic pattern is showing up everywhere: classifiers that need to call tools, data pipelines that need to make decisions, background processes that orchestrate workflows without a human watching. The “MCP is dead” argument treats agents as though they are all developer tools running in a terminal session. That’s one pattern, and it’s the pattern that gets the most attention because developers are writing the blog posts. But the agentic shift is much broader than that.

I’ve been building Gemini Scribe for nearly a year and a half now. It’s an AI agent that lives inside Obsidian, a note-taking application built on Electron. On desktop, Gemini Scribe runs in the renderer process of a sandboxed app. It has no terminal. It has no $PATH. It cannot reliably shell out to gh or kubectl or anything else. Its entire world is the Obsidian plugin API, the vault on disk, and whatever external capabilities I wire up for it. And on mobile, the constraints are even tighter. Obsidian runs on iOS and Android, where there is no shell at all, no subprocess spawning, no local binary execution. The app sandbox on mobile is absolute. If your answer to “how does an agent use tools?” begins with “just call the CLI,” you’ve already lost half your user base.

When I wanted Gemini Scribe to be able to read my Google Calendar, search my email, or pull context from Google Drive, I didn’t have the option of “just use the CLI.” There is no gcal CLI that runs inside a browser runtime. There is no gmail binary I can spawn from an Electron sandbox, let alone from an iPhone. MCP gave me a way to expose those capabilities through a protocol that works over stdio or HTTP, regardless of where my agent happens to be running.

The same is true of my Podcast RAG system. The query agent runs on the server, orchestrating retrieval, re-ranking, and synthesis in a Python process that has no interactive shell session. I could wire up every capability as a bespoke function call, and in some cases I do. But when I want that same retrieval pipeline to be accessible from Gemini CLI on my laptop, from Gemini Scribe in Obsidian, and from the web frontend, MCP gives me one implementation that serves all three. The alternative is writing and maintaining three separate integration layers.

Or consider a less obvious case: a background agent that monitors a codebase for security vulnerabilities and files tickets when it finds them. This agent runs on a schedule, not in response to a human typing a command. It needs to read files from a repository, query a vulnerability database, and create issues in a project tracker. You could give it a shell, but you shouldn’t. An autonomous agent running unattended with shell access is a privilege escalation vector. A crafted comment in a pull request, a malicious string in a dependency manifest, any of these could become a prompt injection that turns bash into an attack surface. Structured tool protocols are the natural interface for this kind of autonomous workflow precisely because they constrain what the agent can do. The agent gets read_file and create_issue, not bash -c. The narrower the interface, the smaller the blast radius.

The N-by-M Problem Doesn’t Go Away

Holmes frames MCP as solving a problem that doesn’t exist. CLIs already work, so why add a protocol?

But CLIs work for a very specific topology: one human (or one human-like agent) driving one tool at a time through a shell. The moment you step outside that topology, CLIs stop being the answer.

Even if every service had a CLI (and Holmes is right that more should), you still have the consumer problem. A CLI is consumable by exactly one kind of agent: one with shell access. The moment you need that same capability accessible from an Electron plugin, a mobile app, a server-side orchestrator, and a terminal agent, you’re back to writing integration code for each consumer. MCP lets you write the server once and expose it to all of them through a common protocol.

This is the same insight behind LSP, which I wrote about in the context of ACP. Before LSP, every editor had to implement its own Python linter, its own Go formatter, its own TypeScript type-checker. The N-by-M integration problem was a nightmare. LSP didn’t replace the underlying tools. It standardized the interface between the tools and the editors. MCP does the same thing for the interface between capabilities and agents.

Holmes might respond that the N-by-M problem is overstated, that most developers just need one agent talking to a handful of tools. Fair enough for a personal workflow. But the industry isn’t building personal workflows. It’s building platforms where agents need to discover and compose capabilities dynamically, where the set of available tools changes based on the user’s permissions, their organization’s policies, and the context of the current task. That’s the world MCP is designed for.

Authentication is the Feature, Not the Bug

One of Holmes’ sharpest critiques is that MCP is “unnecessarily opinionated about auth.” CLI tools, he notes, use battle-tested flows like gh auth login and AWS SSO that work the same whether a human or an agent is driving.

This is true when the agent is acting as you. But the moment the agent stops acting as you and starts acting on behalf of other people, everything changes.

Imagine you’re building a product where an AI assistant helps your customers manage their calendars. Each customer has their own Google account. You cannot ask each of them to run gcloud auth login in a terminal. You need per-user OAuth tokens, tenant isolation, and an auditable record of every action the agent takes on each user’s behalf. This is not a niche enterprise concern. This is the basic architecture of any multi-tenant agent system.

Or think about something simpler: a shared documentation service protected by OAuth. Your team’s internal knowledge base, your company’s Confluence, your organization’s Google Drive. An agent that needs to search those resources on behalf of a user has to present that user’s credentials, not the developer’s, not a shared service account. This is a solved problem in the web world (every SaaS app does it), but it requires a protocol that understands identity delegation. curl with a hardcoded token doesn’t cut it.

MCP’s authentication specification isn’t trying to replace gh auth login for developers who already have credentials loaded. It’s trying to solve the problem of how an agent running in a hosted environment acquires and manages credentials for users who will never see a terminal. Dismissing this as unnecessary complexity is like dismissing HTTPS because curl works fine over HTTP on your local network.

Where I Actually Agree

I want to be clear that Holmes isn’t wrong about the pain points. MCP server initialization is genuinely flaky. I’ve lost hours to servers that didn’t start, connections that dropped, and state that got corrupted between restarts. The tooling is immature. The debugging experience is terrible. As I wrote in my post on the observability gap, the moment you rely on an agent for something that matters, you realize you’re flying blind. MCP’s opacity makes that worse.

And the context window overhead is real. Benchmarks from ScaleKit show that an MCP agent injecting 43 tool definitions consumed 44,026 tokens before doing any work, while a CLI agent doing the same task needed 1,365. When you’re paying per token, that’s not an abstraction tax you can ignore.

But these are maturity problems, not architecture problems. The early days of LSP were rough too. Language servers crashed, features were spotty, and half the community said “just use the built-in tooling.” The protocol won anyway, because the abstraction was right even when the implementation wasn’t.

The Bridge Pattern

Here’s what I think the mature answer looks like, and it’s neither “use MCP for everything” nor “use CLIs for everything.” It’s building your core capability as a shared library, then exposing it through multiple transports.

Think about how you’d design a tool that queries your internal knowledge base. The business logic (authentication, retrieval, re-ranking) lives in a Python module or a Go package. From that shared core, you generate three thin wrappers. A streaming HTTP MCP server for agents running in web runtimes and hosted environments. A local stdio MCP server for desktop agents like Gemini Scribe or Claude Desktop that communicate over standard input/output. And a CLI binary for developers who want to pipe results through jq or use it from Gemini CLI’s bash tool.

All three share the same code paths. A bug fix in the retrieval logic propagates everywhere. The auth layer adapts to context: the CLI reads your local credentials, the HTTP server handles OAuth tokens, and the stdio server inherits the host process’s permissions. You get the CLI’s simplicity where a shell exists, and MCP’s universality where it doesn’t.

This isn’t hypothetical. It’s what I’m already doing. My gemini-utils library is the shared core: it handles file uploads, deep research, audio transcription, and querying against Gemini’s APIs. It exposes all of that as a set of CLI commands (research, transcribe, query, upload) that I use directly from the terminal every day. But when I wanted those same research capabilities available to Gemini CLI as an agent tool, I built gemini-cli-deep-research, an extension that wraps the same underlying library as an MCP service. The core logic is shared. The CLI is for me at a terminal. The MCP server is for agents that need to invoke deep research as a tool in a larger workflow. Same capability, different transports, each suited to its context.

I think this is the pattern that tool developers should be building toward. The best agent tools of the next few years won’t be “MCP servers” or “CLI tools.” They’ll be capability libraries with multiple faces.

The Real Question

The CLI-vs-MCP debate, as Tobias Pfuetze argued, is the wrong fight. The question isn’t “which is better?” It’s “where does each one belong?”

For a developer in a terminal with their own credentials, driving a coding agent? Use the CLI. It’s faster, cheaper, and the agent already knows how. Holmes is right about that.

For an agent embedded in an application runtime without shell access? For a multi-tenant platform where the agent acts on behalf of users who will never open a terminal? For a system where you need one capability implementation discoverable by multiple heterogeneous agent hosts? That’s where MCP earns its complexity.

And for the tool developer who wants to serve all of these audiences? Build the core once, expose it three ways: CLI, stdio MCP, and streaming HTTP MCP. Let the runtime decide.

The mistake is assuming that because your agent has a shell, every agent has a shell. The terminal is one runtime among many. And as agents move from developer tools into products that serve non-technical users, the fraction of agents that can rely on a $PATH and a .bashrc is going to shrink rapidly.

MCP isn’t dead. It’s just not for you yet. But it might be soon.

A laptop sits on a dark wooden desk under the warm glow of an Edison bulb; above the screen, a stream of glowing, holographic research papers and data visualizations cascades downward like a waterfall, physically dissolving into lines of green and white markdown text as they enter the open terminal window.

Bringing Deep Research to the Terminal

I lost the report somewhere between browser tabs. One moment it was there in the Gemini app, a detailed deep research analysis on how AI agents communicate with each other, complete with citations and a synthesis I’d spent an hour reviewing. The next moment, gone. Along with the draft blog post I’d been weaving it into.

I was working on part nine of my Agentic Shift series, trying to answer the question of what happens when agents start talking to each other instead of just talking to us. The research was sprawling—academic papers on multi-agent systems, documentation from LangGraph and AutoGen, blog posts from researchers at DeepMind and OpenAI. I’d been using Gemini’s deep research feature in the app to help synthesize all of this, and it was genuinely useful. The AI would spend minutes thinking through the question, querying sources, building a structured report. But then I had to move that report into my text-based workflow. Copy, paste, reformat, lose formatting, copy again. Somewhere in that dance between the browser and my terminal, I lost everything.

I stared at the empty browser tab for a moment. I could start over, rerun the research in the Gemini app, be more careful about saving this time. But this wasn’t the first time I’d hit this friction. Every time I used deep research in the browser, I had to bridge two worlds: the app where the AI did its thinking, and the terminal where I actually write and build.

What looked like yak shaving was actually a prerequisite. I needed deep research capabilities in my terminal workflow, not just wanted them. I couldn’t keep jumping between environments. And I was in luck. Just a few weeks earlier, Google had announced that deep research was now available through the Gemini API. The capability I’d been using in the browser could be accessed programmatically.

When Features Live in the Wrong Place

I’m not going to pretend this was built based on demand from the community. I needed this. Specifically, I needed to stop context-switching between the Gemini app and my terminal, because every time I did, I was introducing friction and risk. The lost report was just the most recent symptom of a workflow that was fundamentally broken for how I work.

I live in the terminal. My notes are markdown files. My drafts are plain text. My build process, my git workflow, my entire development environment assumes I’m working with files and command-line tools. When I have to move work from a browser back into that environment, I’m not just inconvenienced—I’m fighting against the grain of everything else I do.

Deep research is powerful. It works. But living in a web app meant it was disconnected from the places where I actually needed it. Sure, other people might benefit from having this integrated into MCP-compatible tools, but that’s a nice side effect. The real reason I built this was simpler: I had to finish part nine of the Agentic Shift series, and I couldn’t do that without fixing my workflow first.

The Model Context Protocol made this possible. It’s a standard for exposing AI capabilities as tools that can plug into different environments. Google’s API gave me the primitives. I just needed to connect them to where I actually work.

Building the Missing Piece

The extension wraps Gemini’s deep research capabilities into the Model Context Protocol, which means it integrates seamlessly with Gemini CLI and any other MCP-compatible client. The architecture is deliberately simple, but it supports two distinct workflows depending on what you need.

The first workflow is straightforward: you have a research question, and you want a deep investigation. You can kick off research with a simple command, but if you use the bundled /deep-research:start slash command, the model actually guides you through a step to optimize your question to get the most out of deep research. The agent then spends tens of minutes—or as much time as it needs—planning the investigation, querying sources, and synthesizing findings into a detailed report with citations you can follow up on.

The second workflow is for when you want to ground the research in your own documents. You use /deep-research:store-create to set up a file search store, then /deep-research:store-upload to index your files. Once they’re uploaded, you have two options: you can include that dataset in the deep research process so the agent grounds its investigation in your specific sources, or you can query against it directly for a simpler RAG experience. This is the same File Search capability I wrote about in November when I rebuilt my Podcast RAG system, but now it’s accessible from the terminal as part of my normal workflow.

The extension maintains local state in a workspace cache, so you don’t have to remember arcane resource identifiers or lose track of running research jobs. The whole thing is designed to feel as natural as running a grep command or kicking off a build—it’s just another tool in the environment where I already work.

So did it actually work?

The first time I ran it, I asked for a deep dive into Stonehenge construction. I’d been reading Ken Follett’s novel Circle of Days and found myself curious about the scientific evidence behind the story, what do we actually know about how it was built and who built it. I kicked off the query and watched something fascinating happen. The model understood that deep research takes time. Instead of just waiting silently, it kept checking in to see if the research was done, almost like checking the oven to see if dinner was ready. Twenty minutes later, a markdown file appeared in my filesystem with a comprehensive research report, complete with citations to academic sources, isotope analysis, and archaeological evidence. I didn’t have to copy anything from a browser. I didn’t lose any formatting. It was just there, ready to reference. The report mentioned the Bell Beaker culture and what happened to the Neolithic builders around 2500 BCE, which sent me down another rabbit hole. I immediately ran a second research query on that transition. Same seamless experience. That’s when I knew this was exactly what I needed.

What This Actually Means

I think extensions like this represent something important about where AI development is heading. We’re past the proof-of-concept phase where every AI interaction is a magic trick. Now we’re in the phase where AI capabilities need to integrate into actual workflows—not replace them, but augment them in ways that feel natural.

This is what I wrote about in November when I talked about the era of Personal Software. We’ve crossed a threshold where building a bespoke tool is often faster—and certainly less frustrating—than trying to adapt your workflow to someone else’s software. I didn’t build this extension for the community. I built it because I needed it. I had lost work, and I needed to stop context-switching between environments. If other people find it useful, that’s a nice side effect, but it’s fundamentally software for an audience of one.

The key insight for me was that the Model Context Protocol isn’t just a technical standard; it’s a design pattern for making AI tools composable. Instead of building a monolithic research application with its own UI and workflow, I built a small, focused extension that does one thing well and plugs into the environment where I already work. That composability matters because it means the tool can evolve with my workflow rather than forcing my workflow to evolve around the tool.

There’s also something interesting happening with how we think about AI capabilities. Deep research isn’t about making the model smarter—it’s about giving it time and structure. The same model that gives you a superficial answer in three seconds can give you a genuinely insightful report if you let it think for tens of minutes and provide it with the right sources. We’re learning that intelligence isn’t just about raw capability; it’s about how you orchestrate that capability over time.

What Comes Next

The extension is live on GitHub now, and I’m using it daily for my own research workflows. The immediate next step is adding better control over the research format—right now you can specify broad categories like “Technical Deep Dive” or “Executive Brief,” but I want more granular control over structure and depth. I’m also curious about chaining multiple research tasks together, where the output of one investigation becomes the input for the next.

But the bigger question I’m sitting with is what other AI capabilities are hiding in plain sight, waiting for someone to make them accessible. Deep research was always there in the Gemini API; it just needed a wrapper that made it feel like a natural part of the development workflow. What else is out there?

If you want to try it yourself, you’ll need a Gemini API key (get one at ai.dev) and set the GEMINI_DEEP_RESEARCH_API_KEY environment variable. Deep research runs on Gemini 3.0 Pro, and you can find the current pricing here. It’s charged based on token consumption for the research process plus any tool usage fees.

Install the extension with:

gemini extensions install https://github.com/allenhutchison/gemini-cli-deep-research --auto-update

The full source is on github.

As for me, I still need to finish part nine of the Agentic Shift series. But now I can get back to it with the confidence that I’m working in my preferred environment, with the tools I need accessible right from the terminal. Fair warning: once you start using AI for actual deep research, it’s hard to go back to the shallow stuff.

A close-up photograph on a wooden workbench shows a hand-carved wooden tool handle resting on a MacBook Pro keyboard. The handle transitions into a glowing blue and orange digital wireframe where it extends over the laptop's screen, which displays lines of green code. Wood shavings, chisels, and other traditional tools are scattered around the laptop. A warm desk lamp illuminates the scene from the right.

The Era of Personal Software

I was sitting in a coffee shop this afternoon, nursing a cappuccino and doing a quick triage of the GitHub repositories I maintain. It was supposed to be a quick check-in, but I was surprised to find a pile of issues I hadn’t seen before. They had slipped through the cracks of my notifications.

My immediate reaction wasn’t just annoyance; it was an itch to fix the process. I needed a way to monitor a configurable set of repos and get a consolidated report of new activity—something bespoke. For my smaller projects, I want to see everything. For the big, noisy ones, I only care if I’m assigned or mentioned.

So, I opened up my terminal. I fired up gemini cli and started describing what I needed.

Twenty minutes later, I had a working command-line tool. It did exactly what I described, filtering the noise exactly how I wanted. I ran it, verified the output, and added it to my daily workflow. I closed my laptop and went on with my day.

But on the walk home, I realized something strange had happened. Or rather, something hadn’t happened.

I never opened Google. I never searched GitHub for “activity monitor CLI.” I didn’t spend an hour trawling through “Top 10 GitHub Tools” blog posts, or installing three different utilities only to find out one was deprecated and the other required a subscription.

I just built the thing I needed and moved on.

We are entering the era of Personal Software. This is software written for an audience of one. It’s an application or a script built to solve a specific problem for a specific person, with no immediate intention of scaling, monetizing, or even sharing.

Looking back at my recent work, I realize I’ve been living in this category for a while. In many ways, this is the active evolution of the “Small Tools, Big Ideas” concept I explored earlier this year. Instead of just finding these sharp, focused tools, I’m now building them. Gemini Scribe started because I wanted a better way to write in Obsidian. Podcast Rag exists solely because I wanted to search my own podcast history. My github-activity-reporter from this afternoon? Pure personal necessity. Even adh-cli was just a sandbox for me to test ideas for the Gemini CLI.

We have crossed a threshold where building a bespoke application is often faster—and certainly less frustrating—than finding an off-the-shelf solution that mostly works. The friction of creation has dropped so low that it is now competing with the friction of discovery.

There is a profound freedom in this approach. When you build for an audience of one, the software does exactly what you want and nothing more. There is no feature bloat, no upsell, no UI clutter. You are the product manager, the engineer, and the customer. If your workflow changes next week, you don’t have to file a feature request and hope it gets upvoted; you just change the code. You don’t have to convince anyone else that your problem is worth solving.

But this freedom comes with a new kind of responsibility. When you step outside the walled garden of managed software, you are on your own. If you get stuck, there is no support ticket to file. If an API changes and breaks your tool, you are on the hook to fix it.

There is also the “trap of success.” Sometimes, your personal software is so useful that it accidentally becomes non-personal. Friends ask for it. Colleagues want to fork it. Suddenly, you aren’t just a user anymore; you’re a maintainer. You have to decide if you’re willing to take on the burden of supporting others, or if you’re comfortable saying, “This works for me, good luck to you.”

Not every problem is a nail for this particular hammer, of course. Over time, I’ve started to develop a rubric for what makes for good Personal Software.

The sweet spot is usually glue and logic. If you need to connect two APIs that don’t talk to each other, or parse a messy data export into a clean report, AI can write that script in seconds. My GitHub activity reporter is a perfect example: it’s just fetching data, filtering it against my specific rules, and printing text.

It’s also great for ephemeral workflows. If you have a task you need to do fifty times today but might never do again—like renaming a batch of files based on their content or scraping a specific webpage for research—building a throwaway tool is vastly superior to doing it manually.

Another fantastic category is quick web applications. We used to think of web apps as heavy projects requiring frameworks and hosting headaches. But modern platforms like Google Cloud Run or Vercel have made deployment trivial. Tools like Google AI Studio take this even further—offering a free “vibe coding” platform that can take you from a rough idea to a hosted application in minutes. My boxing workout app is a prime example: I didn’t write a line of infrastructure code; I just described the workout timer I needed, and it was live before I even put on my gloves.

Where Personal Software falls short is in infrastructure and security. I wouldn’t build my own password manager or roll my own encryption tools, no matter how good the model is. The stakes are too high, and the “audience of one” means there are no other eyes on the code to catch critical vulnerabilities. Similarly, if a problem requires a complex, interactive GUI or high-availability hosting, the maintenance burden usually outweighs the benefits of customization.

Despite the downsides, I find this shift fascinating. For decades, software development was an industrial process—building generic tools for mass consumption. Now, it’s becoming a craft again. We are returning to a time where we build our own tools, fitting the handle perfectly to our own grip.

So, I want to turn the question over to you. What are you building just for yourself? Are there small, nagging problems you’ve solved with a script only you will ever see? I’d love to hear about the kinds of personal software you’re creating in this new era. Let me know in the comments or reach out—I’m genuinely curious to see what handles you’re crafting.

A retro computer monitor displaying the Gemini CLI prompt "> Ask Gemini to scaffold a web app" inside a glowing neon blue and pink holographic wireframe box, representing a digital sandbox.

The Guardrails of Autonomy

I still remember the first time I let an LLM execute a shell command on my machine. It was a simple ls -la, but my finger hovered over the Enter key for a solid ten seconds.

There is a visceral, lizard-brain reaction to giving an AI that level of access. We all know the horror stories—or at least the potential horror stories. One hallucinated argument, one misplaced flag, and a helpful cleanup script becomes rm -rf /. This fear creates a central tension in what I call the Agentic Shift. We want agents to be autonomous enough to be useful—fixing a bug across ten files while we grab coffee—but safe enough to be trusted with the keys to the kingdom.

Until now, my approach with the Gemini CLI was the blunt instrument of “Human-in-the-Loop.” Any tool call with a side effect—executing shell commands, writing code, or editing files—required a manual y/n confirmation. It was safe, sure. But it was also exhausting.

I vividly remember asking Gemini to “fix all the linting errors in this project.” It brilliantly identified the issues and proposed edits for twenty different files. Then I sat there, hitting yyy… twenty times.

The magic evaporated. I wasn’t collaborating with an intelligent agent; I was acting as a slow, biological barrier for a very expensive macro. This feeling has a name—“Confirmation Fatigue”—and it’s the silent killer of autonomy. I realized I needed to move from micromanagement to strategic oversight. I didn’t want to stop the agent; I wanted to give it a leash.

The Policy Engine

The solution I’ve built is the Gemini CLI Policy Engine.

Think of it as a firewall for tool calls. It sits between the LLM’s request and your operating system’s execution. Every time the model reaches for a tool—whether it’s to read a file, run a grep command, or make a network request—the Policy Engine intercepts the call and evaluates it against a set of rules.

The system relies on three core actions:

  1. allow: The tool runs immediately.
  2. deny: The AI gets a “Permission denied” error.
  3. ask_user: The default manual approval.

A Hierarchy of Trust

The magic isn’t just in blocking or allowing things; it’s in the hierarchy. Instead of a flat list of rules, I built a tiered priority system that functions like layers of defense.

At the base, you have the Default Safety Net. These are the built-in rules that apply to everyone—basic common sense like “always ask before overwriting a file.”

Above that sits the User Layer, which is where I define my personal comfort zone. This allows me to customize the “personality” of my safety rails. On my personal laptop, I might be a cowboy, allowing git commands to run freely because I know I can always undo a bad commit. But on a production server, I might lock things down tighter than a vault.

Finally, at the top, is the Enterprise/Admin Layer. These are the immutable laws of physics for the agent. In an enterprise setting, this is where you ensure that no matter how “creative” the agent gets, it can never curl data to an external IP or access sensitive directories.

Safe Exploration

In practice, this means I can trust the agent to look but ask it to verify before it touches. I generally trust the agent to check the repository status, review history, or check if the build passed. I don’t need to approve every git log or gh run list.

[[rule]]
toolName = "run_shell_command"
commandPrefix = [
  "git status",
  "git log",
  "git diff",
  "gh issue list",
  "gh pr list",
  "gh pr view",
  "gh run list"
]
decision = "allow"
priority = 100

Yolo Mode

Sometimes, I’m working in a sandbox and I just want speed. I can use the dedicated yolo mode to take the training wheels off. There is a distinct feeling of freedom—and a slight thrill of danger—when you watch the terminal fly by, commands executing one after another.

However, even in Yolo mode, I want a final sanity check before I push code or open a PR. While Yolo mode is inherently permissive, I define specific high-priority rules to catch critical actions. I also explicitly block docker commands—I don’t want the agent spinning up (or spinning down) containers in the background without me knowing.

# Exception: Always ask before committing or creating a PR
[[rule]]
toolName = "run_shell_command"
commandPrefix = ["git commit", "gh pr create"]
decision = "ask_user"
priority = 900
modes = ["yolo"]

# Exception: Never run docker commands automatically
[[rule]]
toolName = "run_shell_command"
commandPrefix = "docker"
decision = "deny"
priority = 999
modes = ["yolo"]

The Hard Stop

And then there are the things that should simply never happen. I don’t care how confident the model is; I don’t want it rebooting my machine. These rules are the “break glass in case of emergency” protections that let me sleep at night.

[[rule]]
toolName = "run_shell_command"
commandRegex = "^(shutdown|reboot|kill)"
decision = "deny"
priority = 999

Decoupling Capability from Control

The significance of this feature goes beyond just saving me from pressing y. It fundamentally changes how we design agents.

I touched on this concept in my series on autonomous agents, specifically in Building Secure Autonomous Agents, where I argued that a “policy engine” is essential for scaling from one agent to a fleet. Now, I’m bringing that same architecture to the local CLI.

Previously, the conversation around AI safety often presented a binary choice: you could have a capable agent that was potentially dangerous, or a safe agent that was effectively useless. If I wanted to ensure the agent wouldn’t accidentally delete my home directory, the standard advice was to simply remove the shell tool. But that is a false choice. It confuses the tool with the intent. Removing the shell doesn’t just stop the agent from doing damage; it stops it from running tests, managing git, or installing packages—the very things I need it to do.

With the Policy Engine, I can give the agent powerful tools but wrap them in strict policies. I can give it access to kubectl, but only for get commands. I can let it edit files, but only on specific documentation sites.

This is how we bridge the gap between a fun demo and a production-ready tool. It allows me to define the sandbox in which the AI plays, giving me the confidence to let it run autonomously within those boundaries.

Defining Your Own Rules

The Policy Engine is available now in the latest release of Gemini CLI. You can dive into the full documentation here.

If you want to see exactly what rules are currently active on your system—including the built-in defaults and your custom additions—you can simply run /policies list from inside the Gemini CLI.

I’m currently running a mix of “Safe Exploration” and “Hard Stop” rules. It’s quieted the noise significantly while keeping my file system intact. I’d love to hear how you configure yours—are you a “deny everything” security maximalist, or are you running in full “allow” mode?

A stylized, dark digital illustration of an open laptop displaying lines of blue code. Floating above the laptop are three glowing, neon blue wireframe icons: a document on the left, a calendar in the center, and an envelope on the right. The icons appear to be formed from streams of digital particles rising from the laptop screen, symbolizing the integration of digital tools. The overall aesthetic is futuristic and high-tech, with dramatic lighting emphasizing the connection between the code and the applications.

Bringing the Office to the Terminal

There is a specific kind of friction that every developer knows. It’s the friction of the “Alt-Tab.”

You’re deep in the code, holding a complex mental model of a system in your head, when you realize you need to check a requirement. That requirement lives in a Google Doc. Or maybe you need to see if you have time to finish a feature before your next meeting. That information lives in Google Calendar.

So you leave the terminal. You open the browser. You navigate the tabs. You find the info. And in those thirty seconds, the mental model you were holding starts to evaporate. The flow is broken.

But it’s not just the context switch that kills your momentum—it’s the ambush. The moment you open that browser window, the red dots appear. Chat pings, new emails, unresolved comments on a doc you haven’t looked at in two days—they all clamor for your attention. Before you know it, the quick thing you needed to look up has morphed into an hour of answering questions and putting out fires. You didn’t just lose your place in the code; you lost your afternoon.

I’ve been thinking a lot about this friction lately, especially as I’ve moved more of my workflow into the Gemini CLI. If we want AI to be a true partner in our development process, it can’t just live in a silo. It needs access to the context of our work—and for most of us, that context is locked away in the cloud, in documents, chats, and calendars.

That’s why I built the Google Workspace extension for Gemini CLI.

Giving the Agent “Senses

We often talk about AI agents in the abstract, but their utility is defined by their boundaries. An agent that can only see your code is a great coding partner. An agent that can see your code and your design documents and your team’s chat history? That’s a teammate.

This extension connects the Gemini CLI to the Google Workspace APIs, effectively giving your terminal-based AI a set of digital senses and hands. It’s not just about reading data; it’s about integrating that data into your active workflow.

Here is what that looks like in practice:

1. Contextual Coding

Instead of copying and pasting requirements from a browser window, you can now ask Gemini to pull the context directly.

“Find the ‘Project Atlas Design Doc’ in Drive, read the section on API authentication, and help me scaffold the middleware based on those specs.”

2. Managing the Day

I often get lost in work and lose track of time. Now, I can simply ask my terminal:

“Check my calendar for the rest of the day. Do I have any blocks of free time longer than two hours to focus on this migration?”

3. Seamless Communication

Sometimes you just need to drop a quick note without leaving your environment.

“Send a message to the ‘Core Eng’ chat space letting them know the deployment is starting now.”

The Accidental Product

Truth be told, I didn’t set out to build a product. When I first joined Google DeepMind, this was simply my “starter project.” My manager suggested I spend a few weeks experimenting with Google Workspace and our agentic capabilities, and the Gemini CLI seemed like the perfect sandbox for that kind of exploration.

I started building purely for myself, guided by my own daily friction. I wanted to see if I could check my calendar without leaving the terminal. Then I wanted to see if I could pull specs from a Doc. I followed the path of my own curiosity, adding tools one by one.

But when I shared this little experiment with a few colleagues, the reaction was immediate. They didn’t just think it was cool; they wanted to install it. That’s when I realized this wasn’t just a personal hack—it was a shared need. It snowballed from a few scripts into a full-fledged extension that we knew we had to ship.

Under the Hood

The extension is built as a Model Context Protocol (MCP) server, which means it runs locally on your machine. It uses your own OAuth credentials, so your data never passes through a third-party server. It’s direct communication between your local CLI and the Google APIs.

It currently supports a wide range of tools across the Workspace suite:

  • Docs & Drive: Search for files, read content, and even create new docs from markdown.
  • Calendar: List events, find free time, and schedule meetings.
  • Gmail: Search threads, read emails, and draft replies.
  • Chat: Send messages and list spaces.

Why This Matters

This goes back to the idea of “Small Tools, Big Ideas.” Individually, a command-line tool to read a calendar isn’t revolutionary. But when you combine that capability with the reasoning engine of a large language model, it becomes something else entirely.

It turns your terminal into a cockpit for your entire digital work life. It allows you to script interactions between your code and your company’s knowledge base. It reduces the friction of context switching, letting you stay where you are most productive.

If you want to try it out, the extension is open source and available now. You can install it directly into the Gemini CLI:

gemini extensions install https://github.com/gemini-cli-extensions/workspace

I’m curious to see how you all use this. Does it change your workflow? Does it keep you in the flow longer? Give it a spin and let me know.

A developer leans back in his chair with hands behind his head, smiling with relief. His monitor displays a large glowing "DELETE" button. In the background, a messy, tangled server rack is fading away, symbolizing the removal of complex infrastructure.

The Joy of Deleting Code: Rebuilding My Podcast Memory

Late last year, I shared the story of a personal obsession: building an AI system grounded in my podcast history. I had hundreds of hours of audio—conversations that had shaped my thinking—trapped in MP3 files. I wanted to set them free. I wanted to be able to ask my library questions, find half-remembered quotes, and synthesize ideas across years of listening.

So, I built a system. And like many “v1” engineering projects, it was a triumph of brute force.

It was a classic Retrieval-Augmented Generation (RAG) pipeline, hand-assembled from the open-source parts bin. I had a reliable tool called podgrab acting as my scout, faithfully downloading every new episode. But downstream from that was a complex RAG implementation to chop transcripts into bite-sized chunks. I had an embedding model to turn those chunks into vectors. And sitting at the center of it all was a vector database (ChromaDB) that I had to host, manage, and maintain.

It worked, but it was fragile. I didn’t even have a proper deployment setup; I ran the whole thing from a tmux session, with different panes for the ingestion watcher, the vector database, and the API server. It felt like keeping a delicate machine humming by hand. Every time I wanted to tweak the retrieval logic or—heaven forbid—change the embedding model, I was looking at a weekend of re-indexing and refactoring. I had built a memory for my podcasts, but I had also built myself a part-time job as a database administrator.

Then, a few weeks ago, I saw this announcement from the Gemini team.

They were launching File Search, a tool that promised to collapse my entire precarious stack into a single API call. The promise was bold: a fully managed RAG system. No vector DB to manage. No manual chunking strategies to debate. No embedding pipelines to debug. You just upload the files, and the model handles the rest.

I remember reading the documentation and feeling that specific, electric tingle that hits you when you realize the “hard problem” you’ve been solving is no longer a hard problem. It wasn’t just an update; it was permission to stop doing the busy work. I was genuinely excited—not just to write new code, but to tear down the old stuff.

Sometimes, it’s actually more fun to delete code than it is to write it.

The first step was the migration. I wrote a script to push my archive—over 18,000 podcast transcripts—into the new system. It took a while to run, but when it finished, everything was just… there. Searchable. Grounded. Ready.

That was the signal I needed. I opened my editor and started deleting code I had painstakingly written just last year. Podgrab stayed—it was doing its job perfectly—but everything else was on the chopping block.

  • I deleted the chromadb dependency and the local storage management. Gone.
  • I deleted the custom logic for sliding-window text chunking. Gone.
  • I deleted the manual embedding generation code. Gone.
  • I deleted the old web app and a dozen stagnant prototypes that were cluttering up the repo. Gone.

I watched my codebase shrink by hundreds of lines. The complexity didn’t just move; it evaporated. It was more than just a cleanup; it was a chance for a fresh start with new assumptions and fewer constraints. I wasn’t patching an old system anymore; I was building a new one, unconstrained by the decisions I made a year ago.

In its place, I wrote a new, elegant ingestion script. It does one thing: it takes the transcripts generated from the files podgrab downloads and uploads them to the Gemini File Search store. That’s it. Google handles the indexing, the storage, and the retrieval.

With the heavy lifting gone, I was free to rethink the application itself. I built a new central brain for the project, a lightweight service I call mcp_server.py (implementing the Model Context Protocol).

Previously, my server was bogged down with the mechanics of how to find data. Now, mcp_server.py simply hands a user’s query to my rag.py module. That module doesn’t need to be a database client anymore; it just configures the Gemini FileSearch tool and gets out of the way. The model itself, grounded by the tool, does the retrieval, the synthesis, and even the citation.

The difference is profound. The “RAG” part of my application—the part that used to consume 80% of my engineering effort—is now just a feature I use, like a spell checker or a date parser.

This shift is bigger than my podcast project. It changes the calculus for every new idea I have. Previously, if I wanted to build a grounded AI tool for a different context—say, for my project notes or my email archives—I would hesitate. I’d think about the boilerplate, the database setup, the chunking logic. Now? I can spin up a robust, grounded system in an hour.

My podcast agent is smarter now, faster, and much cheaper to run. But the best part? I’m not a database administrator anymore. I’m just a builder again.

You can try out the new system yourself at podcast-rag.hutchison.org or check out the code on GitHub.

A hand points to an open journal or report, bathed in a bright spotlight against a dark background. The left page contains a structured, numerical report, and the right page shows a coherent narrative summary. This visually represents the transformation of data into a story.

The Examined Life of a Developer

The Problem of Visibility

The air in the office seems to thin a little as September rolls around, a familiar tension settling in as we all turn to the task of documenting our work. For many of us, this is a straightforward process. Our internal tools—our internal issue tracker, our company’s homegrown SCM, our project trackers—are designed to capture and report on every line of code, every bug fixed, and every feature shipped.

But what about the work that happens outside of those well-lit walls?

Lately, our team has been deeply invested in open source, pouring countless hours into a project like the Gemini CLI. It’s exciting, valuable work. It builds our skills, strengthens the community, and provides a powerful public-facing tool. Yet, none of our internal reporting tools are wired to track the PRs I’ve reviewed, the issues I’ve triaged, or the new features I’ve authored in a public repository. It’s a classic modern engineering problem: your work is everywhere, but your metrics are only in one place.

I needed a way to bridge that gap. I wanted a comprehensive view of my contributions that didn’t just exist in a list of commits but told a story of my impact. I needed something that could remind me of the little things—the code reviews, the issue comments—that are often the most valuable part of open source collaboration. So, I did what any engineer would do: I built a small tool to solve a big problem. This led me to create the GitHub Activity Reporter.

From Utility to Narrative

My initial idea was modest. I wanted a script that could query the GitHub API for my activity across specific repositories and organizations. It would pull in my authored pull requests, issues I created, and even the “orphan” commits that aren’t yet tied to a PR. But as I started building it, my thinking shifted. A raw data dump is helpful for a spreadsheet, but for a performance review, you need a narrative. You need a story.

I’ve always been a believer in the philosophy of “small tools, big ideas.” I’ve found that some of the most profound solutions start with a simple, focused utility. In this case, the big idea wasn’t just to report on my activity but to give that activity a voice. By integrating with Google’s Gemini API, I realized I could transform a dry, structured report into a human-readable narrative. The tool could do the heavy lifting of data collection and then use the AI to tell a coherent, compelling story.

To show you what that looks like, here is a report from a recent week on the Gemini CLI. The first part is the raw data straight from the activity report, and the second is the narrative generated by the AI.

A Week on Gemini CLI

Structured Report

# GitHub Activity Report for allenhutchison
**Period:** `2025-09-07` to `2025-09-13`
**Repositories:** google-gemini/gemini-cli

## 📝 Contributions
_Pull requests, issues, and commits authored by you_

### Pull Requests Authored
- [#8348](https://github.com/google-gemini/gemini-cli/pull/8348) - feat(cli): configure policy engine from existing settings _(open)_
  - [`fccd753`]([https://github.com/google-gemini/gemini-cli/commit/fccd7530fb5574a726ef5db5fe8ad3f155474b3d](https://github.com/google-gemini/gemini-cli/commit/fccd7530fb5574a726ef5db5fe8ad3f155474b3d)) - feat(cli): configure policy engine from existing settings
- [#8078](https://github.com/google-gemini/gemini-cli/pull/8078) - feat: Tool Integration with PolicyEngine (PR 2 of #7231) _(open)_
  - [`e35ae54`]([https://github.com/google-gemini/gemini-cli/commit/e35ae5425547abb492415f604378692795c89569](https://github.com/google-gemini/gemini-cli/commit/e35ae5425547abb492415f604378692795c89569)) - feat(core): implement Tool Confirmation Message Bus foundation (#7231)
  - [`dccd03a`]([https://github.com/google-gemini/gemini-cli/commit/dccd03a6d97c02a004359a52f80c0fada5318625](https://github.com/google-gemini/gemini-cli/commit/dccd03a6d97c02a004359a52f80c0fada5318625)) - fix(policy): address security issue in PolicyEngine argument matching
  - [`805270b`]([https://github.com/google-gemini/gemini-cli/commit/805270bb1f9cb4ff7d70a5a8d639fac949dd0f5b](https://github.com/google-gemini/gemini-cli/commit/805270bb1f9cb4ff7d70a5a8d639fac949dd0f5b)) - fix(policy): prevent stack overflow from circular references in stableStringify
  - [`f2ea10a`]([https://github.com/google-gemini/gemini-cli/commit/f2ea10a46adc8ae79fd47750e4cefe94bfcdc21d](https://github.com/google-gemini/gemini-cli/commit/f2ea10a46adc8ae79fd47750e4cefe94bfcdc21d)) - fix(policy-engine): address high-severity security issues in stableStringify
  - [`679f05e`]([https://github.com/google-gemini/gemini-cli/commit/679f05eb336097b34d5c3881c5925349f33a5175](https://github.com/google-gemini/gemini-cli/commit/679f05eb336097b34d5c3881c5925349f33a5175)) - fix(tests): resolve TypeScript build errors in policy-engine tests
  - ... and 7 more commits

### Issues Created
- No issues created during this period.

### Work in Progress
_Commits not yet part of a pull request_

#### `google-gemini/gemini-cli`
- [`ba85aa4`]([https://github.com/google-gemini/gemini-cli/commit/ba85aa49c7661dde884255679f925c787a678757](https://github.com/google-gemini/gemini-cli/commit/ba85aa49c7661dde884255679f925c787a678757)) - feat(core): Tool Confirmation Message Bus foundation (PR 1 of 3) (#7835)
- [`ef9469a`]([https://github.com/google-gemini/gemini-cli/commit/ef9469a417b3631544e329b0845098a5b042c7f4](https://github.com/google-gemini/gemini-cli/commit/ef9469a417b3631544e329b0845098a5b042c7f4)) - feat(commands): Add new commands for docs, git, and PR review (#7853)

## 🔧 Maintainer Work
_Code reviews, issue triage, and community engagement_

### Pull Requests Reviewed
- [#8305](https://github.com/google-gemini/gemini-cli/pull/8305) - feat(cli) Custom Commands work in Non-Interactive/Headless Mode _(open)_
- [#7347](https://github.com/google-gemini/gemini-cli/pull/7347) - feat: Add a `--session-summary` flag _(closed)_
- [#5393](https://github.com/google-gemini/gemini-cli/pull/5393) - feat(core): Add side-effect metadata to tools for safer execution _(open)_
- [#4102](https://github.com/google-gemini/gemini-cli/pull/4102) - docs: Clarify import processor security model _(open)_
- [#2943](https://github.com/google-gemini/gemini-cli/pull/2943) - Always allow should be smart about subcommands using a safety analyzer _(open)_
- [#1396](https://github.com/google-gemini/gemini-cli/pull/1396) - docs: add screenshot to README _(closed)_
- [#5814](https://github.com/google-gemini/gemini-cli/pull/5814) - feat(cli): validate model names with precedence and concise startup logs _(closed)_
- [#8086](https://github.com/google-gemini/gemini-cli/pull/8086) - Add .geminiignore support to the glob tool. _(closed)_
- [#7660](https://github.com/google-gemini/gemini-cli/pull/7660) - feat: use largest windows runner for ci _(closed)_
- [#7850](https://github.com/google-gemini/gemini-cli/pull/7850) - feat: add cached string width function for performance optimization _(closed)_
- [#7913](https://github.com/google-gemini/gemini-cli/pull/7913) - Mention replacements for deprecated settings in settings.json _(closed)_

### Pull Requests Closed/Merged
- [#7853](https://github.com/google-gemini/gemini-cli/pull/7853) - feat(commands): Add new commands for docs, git, and PR review _(merged (author))_
- [#7835](https://github.com/google-gemini/gemini-cli/pull/7835) - feat(core): Tool Confirmation Message Bus foundation (PR 1 of 3) _(merged (author))_
- [#8086](https://github.com/google-gemini/gemini-cli/pull/8086) - Add .geminiignore support to the glob tool. _(merged (reviewed))_
- [#7913](https://github.google.com/google-gemini/gemini-cli/pull/7913) - Mention replacements for deprecated settings in settings.json _(merged (reviewed))_
- [#7850](https://github.com/google-gemini/gemini-cli/pull/7850) - feat: add cached string width function for performance optimization _(merged (reviewed))_
- [#7660](https://github.com/google-gemini/gemini-cli/pull/7660) - feat: use largest windows runner for ci _(closed (reviewed))_
- [#5814](https://github.com/google-gemini/gemini-cli/pull/5814) - feat(cli): validate model names with precedence and concise startup logs _(closed (reviewed))_

### Issue Engagement
- [#8022](https://github.com/google-gemini/gemini-cli/issues/8022) - Structured JSON Output _(mentioned, commented, closed)_
- [#7113](https://github.com/google-gemini/gemini-cli/issues/7113) - /setup-github returns 404 not found _(commented, open)_
- [#5435](https://github.com/google-gemini/gemini-cli/issues/5435) - Commands Should work in Non-Interactive Mode _(mentioned, commented, assigned, open)_
- [#7763](https://github.com/google-gemini/gemini-cli/issues/7763) - Release Failed for v0.3.2 || "N/A" on 2025-09-04 _(mentioned, closed)_
- [#3132](https://github.com/google-gemini/gemini-cli/issues/3132) - Support SubAgent architecture _(assigned, open)_

### Issues Closed
- [#8022](https://github.com/google-gemini/gemini-cli/issues/8022) - Structured JSON Output _(closed after commenting)_

---
_Report generated on 2025-09-13_

Narrative Summary

Gemini CLI: A Week of Enhanced Intelligence, Security, and Collaboration

This past week, allenhutchison made significant strides in advancing the google-gemini/gemini-cli, focusing on critical enhancements to the platform’s intelligent tooling, robust security, and developer productivity. Key accomplishments include laying the groundwork for a more configurable and secure Policy Engine, integrating intelligent tool confirmation mechanisms, introducing new commands to streamline developer workflows, and addressing several high-priority security vulnerabilities. Beyond direct contributions, active engagement in code reviews and issue management further solidified the project’s stability and fostered community collaboration.


Pioneering Safer AI Tooling with the Policy Engine

A major theme of the week’s work revolved around making the gemini-cli‘s AI tools more intelligent, secure, and user-friendly, particularly through the Policy Engine. This component is vital for ensuring that AI-driven actions are executed safely, adhere to predefined rules, and respect user intent.

  • Configurable Policy Engine (PR #8348): Significant progress was made on a new feature that will allow the Policy Engine to be configured directly from existing settings. This feat(cli): configure policy engine from existing settings aims to simplify the setup and management of safety policies, making it easier for users to customize how their AI tools operate. While still under review, this PR is a key step towards a more adaptable and powerful security layer.
  • Intelligent Tool Integration and Confirmation (PR #8078, building on #7231 & #7835): This comprehensive pull request represents the second phase of a larger initiative to seamlessly integrate AI tools with the Policy Engine, enhancing user control and transparency.
  • Message Bus Foundation: The work builds upon the feat(core): implement Tool Confirmation Message Bus foundation (PR #7835 and commit e35ae54), which establishes a core communication channel for tools to interact with the system and potentially seek user confirmation before executing sensitive actions. This is crucial for transparency and preventing unintended side effects.
  • Web-Search Tool Integration: A concrete example of this integration is the feat(tools): integrate PolicyEngine with web-search tool (commit 2be4777), demonstrating how the Policy Engine will govern access and execution for external tools, starting with web searches.

Boosting Developer Productivity with New CLI Commands

Improving developer experience was also a priority, with the introduction of new commands designed to streamline common workflows directly within the CLI.

  • New Productivity Commands (PR #7853, merged): This impactful contribution added new commands for docs, git, and PR review. These commands empower developers to manage documentation, interact with Git repositories, and review pull requests without switching context, significantly enhancing workflow efficiency.
  • Non-Interactive Command Execution (PR #8305, reviewed): Related to issue #5435, work was reviewed to enable Custom Commands work in Non-Interactive/Headless Mode. This is crucial for enabling automation and scripting, allowing the CLI to be integrated into CI/CD pipelines or other automated systems without requiring manual intervention.

Fortifying Security and Stability

The week also saw a strong focus on enhancing the security and stability of the gemini-cli, particularly within the critical Policy Engine component.

  • Addressing Critical Policy Engine Vulnerabilities (PR #8078 commits): Several high-priority security fixes were implemented to safeguard the Policy Engine:
  • fix(policy): address security issue in PolicyEngine argument matching (commit dccd03a) ensures that tool arguments are correctly and securely processed, preventing potential injection or manipulation.
  • fix(policy-engine): address high-severity security issues in stableStringify (commit f2ea10a) and fix(policy-engine): address critical security issues and improve documentation (commit d693fbf) resolve vulnerabilities related to how data is serialized, preventing potential data integrity or exposure issues.
  • fix(message-bus): use safeJsonStringify for error messages (commit c3c8de8) further hardens error handling to prevent sensitive information leaks.
  • Preventing Stack Overflow Issues: A crucial stability fix, fix(policy): prevent stack overflow from circular references in stableStringify (commit 805270b), was implemented to make the Policy Engine more robust and reliable, especially when dealing with complex or recursive data structures.
  • Ensuring Code Quality: Underlying infrastructure work, including fix(tests): resolve TypeScript build errors in policy-engine tests (commit 679f05e) and Fix lint (commit c35d83c), ensured the stability and maintainability of the codebase supporting these critical features.

Community Collaboration and Project Health

Beyond direct code contributions, allenhutchison actively engaged with the google-gemini/gemini-cli community, contributing to overall project health through diligent code reviews and issue management.

  • Active Code Review and Merged Contributions: Several pull requests from other contributors were reviewed, guiding them to successful merger or closure, demonstrating a commitment to code quality and collaboration:
  • Enhanced Functionality: Reviewed and merged Add .geminiignore support to the glob tool (PR #8086), providing more granular control over file processing.
  • Performance Optimization: Guided the merger of feat: add cached string width function for performance optimization (PR #7850), improving the CLI’s responsiveness.
  • Improved User Guidance: Reviewed and merged Mention replacements for deprecated settings in settings.json (PR #7913), enhancing documentation for users.
  • Infrastructure Improvements: Provided feedback on feat: use largest windows runner for ci (PR #7660) and feat(cli): validate model names with precedence and concise startup logs (PR #5814), contributing to more robust CI/CD and CLI startup.
  • Reviewed several other open PRs, including features like --session-summary (PR #7347) and Add side-effect metadata to tools (PR #5393), and documentation updates (PR #4102).
  • Proactive Issue Management: Engaged with critical issues, demonstrating responsiveness to user feedback and project needs:
  • Resolution: Closed issue #8022, “Structured JSON Output,” after providing input and confirming resolution.
  • Guidance & Ownership: Commented on issue #7113 concerning a 404 error and was assigned to issue #3132, “Support SubAgent architecture,” indicating leadership on future architectural work. Active engagement on #5435, “Commands Should work in Non-Interactive Mode,” directly links to the ongoing work in PR #8305.

This week’s activity paints a clear picture of comprehensive development, combining forward-looking feature development with critical security and stability improvements. The ongoing work on the Policy Engine and Tool Confirmation Message Bus (PRs #8348, #8078) promises a more secure and intelligent gemini-cli, while merged features like new productivity commands (PR #7853) deliver immediate value to developers. Coupled with robust code reviews and issue management, these contributions significantly bolster the google-gemini/gemini-cli‘s capabilities, security posture, and collaborative environment. As these open features progress, users can anticipate an even more powerful, trustworthy, and user-friendly command-line experience.

From Metrics to Mindful Reflection

Socrates famously said, “The unexamined life is not worth living.” This isn’t just a philosophical idea—it’s a fundamental principle for a thriving engineering practice. While the structured report is a great list of “what” you did, it doesn’t tell you “why” or “how” you did it. It doesn’t tell you if you achieved what you set out to do at the beginning of the week.

The true value of a tool like the GitHub Activity Reporter is not in presenting the raw data, but in prompting a deeper level of reflection. Looking at the AI’s narrative, you can ask yourself:

  • Did I focus on the right things? Did I get to the key feature I planned to build, or did other issues and distractions take over?
  • What were the blockers? Were there issues that consumed my time without leading to a merged PR or a closed issue? The report can help you identify these hidden bottlenecks.
  • What was the true impact? Did my contributions, reviews, and issue engagement genuinely move the project forward? Did they help other contributors, or were they just administrative work?

The AI’s ability to synthesize your actions into a cohesive story allows you to see the forest, not just the trees. It’s a powerful tool for a weekly check-in, an on-call handover, or a periodic self-assessment, helping you align your efforts with your goals and grow as a contributor.

A Quick Start

Ready to run the tool for yourself? After setting up your GitHub and Gemini API keys (see the repository’s README for details), you can generate a report just like the one above with a single command:

python github_report.py --start-date 2025-09-07 --repos your-org/your-repo --narrative

A Call to Reflection

The world of engineering is no longer confined to a single company’s walls. We contribute to open source, we collaborate across teams, and our work exists in many different places. This requires us to be more intentional about how we track and reflect on our contributions.

By building a simple tool that leverages AI to tell a compelling story, I’ve found a way to not just see my work, but to understand it. I encourage you to check out the GitHub Activity Reporter, run the tool for yourself, and discover how a small, focused utility can help you capture and reflect on your own narrative. The story of our work is one we all get to tell, and with the right tools, it becomes that much easier to make it count.