Great Video on Gemini Scribe and Obsidian

I was recently looking through the feedback in the Gemini Scribe repository when I noticed a few insightful comments from a user named Paul O’Malley. Curiosity got the better of me, I love seeing who is actually pushing the boundaries of the tools I build, so I took a look at his YouTube page. I quickly found myself deep into a walkthrough titled “I Built a Second Brain That Organises Itself.”

What caught my eye wasn’t just another productivity system, we’ve all seen the “shiny new app” cycle that leads to digital bankruptcy. It was seeing Gemini Scribe being used as the engine for a fully automated Obsidian vault.

The Friction of Digital Maintenance

Paul hits on a fundamental truth: most systems fail because the friction of maintenance—the tagging, the filing, the constant admin—eventually outweighs the benefit. He argues that what we actually need is a system that “bridges the gap in our own executive function”.

In his setup, he uses Obsidian as the chassis because it relies on Markdown. I’ve long believed that Markdown is the native language of AI, and seeing it used here to create a “seamless bridge” between messy human thoughts and structured AI processing was incredibly satisfying.

Gemini Scribe as the Engine

It was a bit surreal to watch Paul walk through the installation of Gemini Scribe as the core engine for this self-organizing brain. He highlights a few features that I poured a lot of heart into:

  • Session History as Knowledge: By saving AI interactions as Markdown files, they become a searchable part of your knowledge base. You can actually ask the AI to reflect on past conversations to find patterns in your own thinking.
  • The Setup Wizard: He uses a “Setup Wizard” to convert the AI from a generic chatbot into a specialized system administrator. Through a conversational interview, the agent learns your profession and hobbies to tailor a project taxonomy (like the PARA method) specifically to you.
  • Agentic Automation: The video demonstrates the “Inbox Processor,” where the AI reads a raw note, gives it a proper title, applies tags, and physically moves it to the right folder.

Beyond the Tool: A Human in the Loop

One thing Paul emphasized that really resonated with my own philosophy of Guiding the Agent’s Behavior is the “Human in the Loop”. When the agent suggests a change or creates a new command, it writes to a staging file first.

As Paul puts it, you are the boss and the AI is the junior employee—it can draft the contract, but you have to sign it before it becomes official. You always remain in control of the files that run your life.

Small Tools, Big Ideas

Seeing the Gemini CLI mentioned as a “cleaner and slightly more powerful” alternative for power users was another nice nod. It reinforces the idea that small, sharp tools can be composed into something transformative.

Building tools in a vacuum is one thing, but seeing them live in the wild, helping someone clear their “mental RAM” and close their loop at the end of the day, is one of the reasons I do this. It’s a reminder that the best technology doesn’t try to replace us; it just makes the foundations a little sturdier.

A photorealistic image shows an old wooden-handled hammer on a cluttered workbench transforming into a small, multi-armed mechanical robot with glowing blue eyes, holding various miniature tools.

Everything Becomes an Agent

I’ve noticed a pattern in my coding life. It starts innocently enough. I sit down to write a simple Python script, maybe something to tidy up my Obsidian vault or a quick CLI tool to query an API. “Keep it simple,” I tell myself. “Just input, processing, output.”

But then, the inevitable thought creeps in: It would be cool if the model could decide which file to read based on the user’s question.

Two hours later, I’m not writing a script anymore. I’m writing a while loop. I’m defining a tools array. I’m parsing JSON outputs and handing them back to the model. I’m building memory context windows.

I’m building an agent. Again.

(For those keeping track: my working definition of an “agent” is simple: a model running in a loop with access to tools. I explored this in depth in my Agentic Shift series, but that’s the core of it.)

As I sit here writing this in January of 2026, I realize that almost every AI project I worked on last year ultimately became an agent. It feels like a law of nature: Every AI project, given enough time, converges on becoming an agent. In this post, I want to share some of what I’ve learned, and the cases where you might skip the intermediate steps and jump straight to building an agent.

The Gravitational Pull of Autonomy

This isn’t just feature creep. It’s a fundamental shift in how we interact with software. We are moving past the era of “smart typewriters” and into the era of “digital interns.”

Take Gemini Scribe, my plugin for Obsidian. When I started, it was a glorified chat window. You typed a prompt, it gave you text. Simple. But as I used it, the friction became obvious. If I wanted Scribe to use another note as context for a task, I had to take a specific action, usually creating a link to that note from the one I was working on, to make sure it was considered. I was managing the model’s context manually.

I was the “glue” code. I was the context manager.

The moment I gave Scribe access to the read_file tool, the dynamic changed. Suddenly, I wasn’t micromanaging context; I was giving instructions. “Read the last three meeting notes and draft a summary.” That’s not a chat interaction; that’s a delegation. And to support delegation, the software had to become an agent, capable of planning, executing, and iterating.

From Scripts to Sudoers

The Gemini CLI followed a similar arc. There were many of us on the team experimenting with Gemini on the command line. I was working on iterative refinement, where the model would ask clarifying questions to create deeper artifacts. Others were building the first agentic loops, giving the model the ability to run shell commands.

Once we saw how much the model could do with even basic tools, we were hooked. Suddenly, it wasn’t just talking about code; it was writing and executing it. It could run tests, see the failure, edit the file, and run the tests again. It was eye-opening how much we could get done as a small team.

But with great power comes great anxiety. As I explored in my Agentic Shift post on building guardrails and later in my post about the Policy Engine, I found myself staring at a blinking cursor, terrified that my helpful assistant might accidentally rm -rf my project.

This is the hallmark of the agentic shift: you stop worrying about syntax errors and start worrying about judgment errors. We had to build a “sudoers” file for our AI, a permission system that distinguishes between “read-only exploration” and “destructive action.” You don’t build policy engines for scripts; you build them for agents.

The Classifier That Wanted to Be an Agent

Last year, I learned to recognize a specific code smell: the AI classifier.

In my Podcast RAG project, I wanted users to search across both podcast descriptions and episode transcripts. Different databases, different queries. So I did what felt natural: I built a small classifier using Gemini Flash Lite. It would analyze the user’s question and decide: “Is this a description search or a transcript search?” Then it would call the appropriate function.

It worked. But something nagged at me. I had written a classifier to make a decision that a model is already good at making. Worse, the classifier was brittle. What if the user wanted both? What if their intent was ambiguous? I was encoding my assumptions about user behavior into branching logic, and those assumptions were going to be wrong eventually.

The fix was almost embarrassingly simple. I deleted the classifier and gave the agent two tools: search_descriptions and search_episodes. Now, when a user asks a question, the agent decides which tool (or tools) to use. It can search descriptions first, realize it needs more detail, and then dive into transcripts. It can do both in parallel. It makes the call in context, not based on my pre-programmed heuristics. (You can try it yourself at podcasts.hutchison.org.)

I saw the same pattern in Gemini Scribe. Early versions had elaborate logic for context harvesting, code that tried to predict which notes the user would need based on their current document and conversation history. I was building a decision tree for context, and it was getting unwieldy.

When I moved Scribe to a proper agentic architecture, most of that logic evaporated. The agent didn’t need me to pre-fetch context; it could use a read_file tool to grab what it needed, when it needed it. The complex anticipation logic was replaced by simple, reactive tool calls. The application got simpler and more capable at the same time.

Here’s the heuristic I’ve landed on: If you’re writing if/else logic to decide what the AI should do, you might be building a classifier that wants to be an agent. Deconstruct those branches into tools, give the agent really good descriptions of what those tools can do, and then let the model choose its own adventure.

You might be thinking: “What about routing queries to different models? Surely a classifier makes sense there.” I’m not so sure anymore. Even model routing starts to look like an orchestration problem, and a lightweight orchestrator with tools for accessing different models gives you the same flexibility without the brittleness. The question isn’t whether an agent can make the decision better than your code. It’s whether the agent, with access to the actual data in the moment, can make a decision at least as good as what you’re trying to predict when you’re writing the code. The agent has context you don’t have at development time.

The “Human-on-the-Loop”

We are transitioning from Human-in-the-Loop (where we manually approve every step) to Human-on-the-Loop (where we set the goals and guardrails, but let the system drive).

This shift is driven by a simple desire: we want partners, not just tools. As I wrote back in April about waiting for a true AI coding partner, a tool requires your constant attention. A hammer does nothing unless you swing it. But an agent? An agent can work while you sleep.

This freedom comes with a new responsibility: clarity. If your agent is going to work overnight, you need to make sure it’s working on something productive. You need to be precise about the goal, explicit about the boundaries, and thoughtful about what happens when things go wrong. Without the right guardrails, an agent can get stuck waiting for your input, and you’ll lose that time. Or worse, it can get sidetracked and spend hours on something that wasn’t what you intended.

The goal isn’t to remove the human entirely. It’s to move us from the execution layer to the supervision layer. We set the destination and the boundaries; the agent figures out the route. But we have to set those boundaries well.

Embracing the Complexity (Or Lack Thereof)

Here’s the counterintuitive thing: building an agent isn’t always harder than building a script. Yes, you have to think about loops, tool definitions, and context window management. But as my classifier example showed, an agentic architecture can actually delete complexity. All that brittle branching logic, all those edge cases I was trying to anticipate: gone. Replaced by a model that can reason about what it needs in the moment.

The real complexity isn’t in the code; it’s in the trust. You have to get comfortable with a system that makes decisions you didn’t explicitly program. That’s a different kind of engineering challenge, less about syntax, more about guardrails and judgment.

But the payoff is a system that grows with you. A script does exactly what you wrote it to do, forever. An agent does what you ask it to do, and sometimes finds better ways to do it than you’d considered.

So, if you find yourself staring at your “simple script” and wondering if you should give it a tools definition… just give in. You’re building an agent. It’s inevitable. You might as well enjoy the company.

A laptop sits on a dark wooden desk under the warm glow of an Edison bulb; above the screen, a stream of glowing, holographic research papers and data visualizations cascades downward like a waterfall, physically dissolving into lines of green and white markdown text as they enter the open terminal window.

Bringing Deep Research to the Terminal

I lost the report somewhere between browser tabs. One moment it was there in the Gemini app, a detailed deep research analysis on how AI agents communicate with each other, complete with citations and a synthesis I’d spent an hour reviewing. The next moment, gone. Along with the draft blog post I’d been weaving it into.

I was working on part nine of my Agentic Shift series, trying to answer the question of what happens when agents start talking to each other instead of just talking to us. The research was sprawling—academic papers on multi-agent systems, documentation from LangGraph and AutoGen, blog posts from researchers at DeepMind and OpenAI. I’d been using Gemini’s deep research feature in the app to help synthesize all of this, and it was genuinely useful. The AI would spend minutes thinking through the question, querying sources, building a structured report. But then I had to move that report into my text-based workflow. Copy, paste, reformat, lose formatting, copy again. Somewhere in that dance between the browser and my terminal, I lost everything.

I stared at the empty browser tab for a moment. I could start over, rerun the research in the Gemini app, be more careful about saving this time. But this wasn’t the first time I’d hit this friction. Every time I used deep research in the browser, I had to bridge two worlds: the app where the AI did its thinking, and the terminal where I actually write and build.

What looked like yak shaving was actually a prerequisite. I needed deep research capabilities in my terminal workflow, not just wanted them. I couldn’t keep jumping between environments. And I was in luck. Just a few weeks earlier, Google had announced that deep research was now available through the Gemini API. The capability I’d been using in the browser could be accessed programmatically.

When Features Live in the Wrong Place

I’m not going to pretend this was built based on demand from the community. I needed this. Specifically, I needed to stop context-switching between the Gemini app and my terminal, because every time I did, I was introducing friction and risk. The lost report was just the most recent symptom of a workflow that was fundamentally broken for how I work.

I live in the terminal. My notes are markdown files. My drafts are plain text. My build process, my git workflow, my entire development environment assumes I’m working with files and command-line tools. When I have to move work from a browser back into that environment, I’m not just inconvenienced—I’m fighting against the grain of everything else I do.

Deep research is powerful. It works. But living in a web app meant it was disconnected from the places where I actually needed it. Sure, other people might benefit from having this integrated into MCP-compatible tools, but that’s a nice side effect. The real reason I built this was simpler: I had to finish part nine of the Agentic Shift series, and I couldn’t do that without fixing my workflow first.

The Model Context Protocol made this possible. It’s a standard for exposing AI capabilities as tools that can plug into different environments. Google’s API gave me the primitives. I just needed to connect them to where I actually work.

Building the Missing Piece

The extension wraps Gemini’s deep research capabilities into the Model Context Protocol, which means it integrates seamlessly with Gemini CLI and any other MCP-compatible client. The architecture is deliberately simple, but it supports two distinct workflows depending on what you need.

The first workflow is straightforward: you have a research question, and you want a deep investigation. You can kick off research with a simple command, but if you use the bundled /deep-research:start slash command, the model actually guides you through a step to optimize your question to get the most out of deep research. The agent then spends tens of minutes—or as much time as it needs—planning the investigation, querying sources, and synthesizing findings into a detailed report with citations you can follow up on.

The second workflow is for when you want to ground the research in your own documents. You use /deep-research:store-create to set up a file search store, then /deep-research:store-upload to index your files. Once they’re uploaded, you have two options: you can include that dataset in the deep research process so the agent grounds its investigation in your specific sources, or you can query against it directly for a simpler RAG experience. This is the same File Search capability I wrote about in November when I rebuilt my Podcast RAG system, but now it’s accessible from the terminal as part of my normal workflow.

The extension maintains local state in a workspace cache, so you don’t have to remember arcane resource identifiers or lose track of running research jobs. The whole thing is designed to feel as natural as running a grep command or kicking off a build—it’s just another tool in the environment where I already work.

So did it actually work?

The first time I ran it, I asked for a deep dive into Stonehenge construction. I’d been reading Ken Follett’s novel Circle of Days and found myself curious about the scientific evidence behind the story, what do we actually know about how it was built and who built it. I kicked off the query and watched something fascinating happen. The model understood that deep research takes time. Instead of just waiting silently, it kept checking in to see if the research was done, almost like checking the oven to see if dinner was ready. Twenty minutes later, a markdown file appeared in my filesystem with a comprehensive research report, complete with citations to academic sources, isotope analysis, and archaeological evidence. I didn’t have to copy anything from a browser. I didn’t lose any formatting. It was just there, ready to reference. The report mentioned the Bell Beaker culture and what happened to the Neolithic builders around 2500 BCE, which sent me down another rabbit hole. I immediately ran a second research query on that transition. Same seamless experience. That’s when I knew this was exactly what I needed.

What This Actually Means

I think extensions like this represent something important about where AI development is heading. We’re past the proof-of-concept phase where every AI interaction is a magic trick. Now we’re in the phase where AI capabilities need to integrate into actual workflows—not replace them, but augment them in ways that feel natural.

This is what I wrote about in November when I talked about the era of Personal Software. We’ve crossed a threshold where building a bespoke tool is often faster—and certainly less frustrating—than trying to adapt your workflow to someone else’s software. I didn’t build this extension for the community. I built it because I needed it. I had lost work, and I needed to stop context-switching between environments. If other people find it useful, that’s a nice side effect, but it’s fundamentally software for an audience of one.

The key insight for me was that the Model Context Protocol isn’t just a technical standard; it’s a design pattern for making AI tools composable. Instead of building a monolithic research application with its own UI and workflow, I built a small, focused extension that does one thing well and plugs into the environment where I already work. That composability matters because it means the tool can evolve with my workflow rather than forcing my workflow to evolve around the tool.

There’s also something interesting happening with how we think about AI capabilities. Deep research isn’t about making the model smarter—it’s about giving it time and structure. The same model that gives you a superficial answer in three seconds can give you a genuinely insightful report if you let it think for tens of minutes and provide it with the right sources. We’re learning that intelligence isn’t just about raw capability; it’s about how you orchestrate that capability over time.

What Comes Next

The extension is live on GitHub now, and I’m using it daily for my own research workflows. The immediate next step is adding better control over the research format—right now you can specify broad categories like “Technical Deep Dive” or “Executive Brief,” but I want more granular control over structure and depth. I’m also curious about chaining multiple research tasks together, where the output of one investigation becomes the input for the next.

But the bigger question I’m sitting with is what other AI capabilities are hiding in plain sight, waiting for someone to make them accessible. Deep research was always there in the Gemini API; it just needed a wrapper that made it feel like a natural part of the development workflow. What else is out there?

If you want to try it yourself, you’ll need a Gemini API key (get one at ai.dev) and set the GEMINI_DEEP_RESEARCH_API_KEY environment variable. Deep research runs on Gemini 3.0 Pro, and you can find the current pricing here. It’s charged based on token consumption for the research process plus any tool usage fees.

Install the extension with:

gemini extensions install https://github.com/allenhutchison/gemini-cli-deep-research --auto-update

The full source is on github.

As for me, I still need to finish part nine of the Agentic Shift series. But now I can get back to it with the confidence that I’m working in my preferred environment, with the tools I need accessible right from the terminal. Fair warning: once you start using AI for actual deep research, it’s hard to go back to the shallow stuff.

A retro computer monitor displaying the Gemini CLI prompt "> Ask Gemini to scaffold a web app" inside a glowing neon blue and pink holographic wireframe box, representing a digital sandbox.

The Guardrails of Autonomy

I still remember the first time I let an LLM execute a shell command on my machine. It was a simple ls -la, but my finger hovered over the Enter key for a solid ten seconds.

There is a visceral, lizard-brain reaction to giving an AI that level of access. We all know the horror stories—or at least the potential horror stories. One hallucinated argument, one misplaced flag, and a helpful cleanup script becomes rm -rf /. This fear creates a central tension in what I call the Agentic Shift. We want agents to be autonomous enough to be useful—fixing a bug across ten files while we grab coffee—but safe enough to be trusted with the keys to the kingdom.

Until now, my approach with the Gemini CLI was the blunt instrument of “Human-in-the-Loop.” Any tool call with a side effect—executing shell commands, writing code, or editing files—required a manual y/n confirmation. It was safe, sure. But it was also exhausting.

I vividly remember asking Gemini to “fix all the linting errors in this project.” It brilliantly identified the issues and proposed edits for twenty different files. Then I sat there, hitting yyy… twenty times.

The magic evaporated. I wasn’t collaborating with an intelligent agent; I was acting as a slow, biological barrier for a very expensive macro. This feeling has a name—“Confirmation Fatigue”—and it’s the silent killer of autonomy. I realized I needed to move from micromanagement to strategic oversight. I didn’t want to stop the agent; I wanted to give it a leash.

The Policy Engine

The solution I’ve built is the Gemini CLI Policy Engine.

Think of it as a firewall for tool calls. It sits between the LLM’s request and your operating system’s execution. Every time the model reaches for a tool—whether it’s to read a file, run a grep command, or make a network request—the Policy Engine intercepts the call and evaluates it against a set of rules.

The system relies on three core actions:

  1. allow: The tool runs immediately.
  2. deny: The AI gets a “Permission denied” error.
  3. ask_user: The default manual approval.

A Hierarchy of Trust

The magic isn’t just in blocking or allowing things; it’s in the hierarchy. Instead of a flat list of rules, I built a tiered priority system that functions like layers of defense.

At the base, you have the Default Safety Net. These are the built-in rules that apply to everyone—basic common sense like “always ask before overwriting a file.”

Above that sits the User Layer, which is where I define my personal comfort zone. This allows me to customize the “personality” of my safety rails. On my personal laptop, I might be a cowboy, allowing git commands to run freely because I know I can always undo a bad commit. But on a production server, I might lock things down tighter than a vault.

Finally, at the top, is the Enterprise/Admin Layer. These are the immutable laws of physics for the agent. In an enterprise setting, this is where you ensure that no matter how “creative” the agent gets, it can never curl data to an external IP or access sensitive directories.

Safe Exploration

In practice, this means I can trust the agent to look but ask it to verify before it touches. I generally trust the agent to check the repository status, review history, or check if the build passed. I don’t need to approve every git log or gh run list.

[[rule]]
toolName = "run_shell_command"
commandPrefix = [
  "git status",
  "git log",
  "git diff",
  "gh issue list",
  "gh pr list",
  "gh pr view",
  "gh run list"
]
decision = "allow"
priority = 100

Yolo Mode

Sometimes, I’m working in a sandbox and I just want speed. I can use the dedicated yolo mode to take the training wheels off. There is a distinct feeling of freedom—and a slight thrill of danger—when you watch the terminal fly by, commands executing one after another.

However, even in Yolo mode, I want a final sanity check before I push code or open a PR. While Yolo mode is inherently permissive, I define specific high-priority rules to catch critical actions. I also explicitly block docker commands—I don’t want the agent spinning up (or spinning down) containers in the background without me knowing.

# Exception: Always ask before committing or creating a PR
[[rule]]
toolName = "run_shell_command"
commandPrefix = ["git commit", "gh pr create"]
decision = "ask_user"
priority = 900
modes = ["yolo"]

# Exception: Never run docker commands automatically
[[rule]]
toolName = "run_shell_command"
commandPrefix = "docker"
decision = "deny"
priority = 999
modes = ["yolo"]

The Hard Stop

And then there are the things that should simply never happen. I don’t care how confident the model is; I don’t want it rebooting my machine. These rules are the “break glass in case of emergency” protections that let me sleep at night.

[[rule]]
toolName = "run_shell_command"
commandRegex = "^(shutdown|reboot|kill)"
decision = "deny"
priority = 999

Decoupling Capability from Control

The significance of this feature goes beyond just saving me from pressing y. It fundamentally changes how we design agents.

I touched on this concept in my series on autonomous agents, specifically in Building Secure Autonomous Agents, where I argued that a “policy engine” is essential for scaling from one agent to a fleet. Now, I’m bringing that same architecture to the local CLI.

Previously, the conversation around AI safety often presented a binary choice: you could have a capable agent that was potentially dangerous, or a safe agent that was effectively useless. If I wanted to ensure the agent wouldn’t accidentally delete my home directory, the standard advice was to simply remove the shell tool. But that is a false choice. It confuses the tool with the intent. Removing the shell doesn’t just stop the agent from doing damage; it stops it from running tests, managing git, or installing packages—the very things I need it to do.

With the Policy Engine, I can give the agent powerful tools but wrap them in strict policies. I can give it access to kubectl, but only for get commands. I can let it edit files, but only on specific documentation sites.

This is how we bridge the gap between a fun demo and a production-ready tool. It allows me to define the sandbox in which the AI plays, giving me the confidence to let it run autonomously within those boundaries.

Defining Your Own Rules

The Policy Engine is available now in the latest release of Gemini CLI. You can dive into the full documentation here.

If you want to see exactly what rules are currently active on your system—including the built-in defaults and your custom additions—you can simply run /policies list from inside the Gemini CLI.

I’m currently running a mix of “Safe Exploration” and “Hard Stop” rules. It’s quieted the noise significantly while keeping my file system intact. I’d love to hear how you configure yours—are you a “deny everything” security maximalist, or are you running in full “allow” mode?

A stylized, dark digital illustration of an open laptop displaying lines of blue code. Floating above the laptop are three glowing, neon blue wireframe icons: a document on the left, a calendar in the center, and an envelope on the right. The icons appear to be formed from streams of digital particles rising from the laptop screen, symbolizing the integration of digital tools. The overall aesthetic is futuristic and high-tech, with dramatic lighting emphasizing the connection between the code and the applications.

Bringing the Office to the Terminal

There is a specific kind of friction that every developer knows. It’s the friction of the “Alt-Tab.”

You’re deep in the code, holding a complex mental model of a system in your head, when you realize you need to check a requirement. That requirement lives in a Google Doc. Or maybe you need to see if you have time to finish a feature before your next meeting. That information lives in Google Calendar.

So you leave the terminal. You open the browser. You navigate the tabs. You find the info. And in those thirty seconds, the mental model you were holding starts to evaporate. The flow is broken.

But it’s not just the context switch that kills your momentum—it’s the ambush. The moment you open that browser window, the red dots appear. Chat pings, new emails, unresolved comments on a doc you haven’t looked at in two days—they all clamor for your attention. Before you know it, the quick thing you needed to look up has morphed into an hour of answering questions and putting out fires. You didn’t just lose your place in the code; you lost your afternoon.

I’ve been thinking a lot about this friction lately, especially as I’ve moved more of my workflow into the Gemini CLI. If we want AI to be a true partner in our development process, it can’t just live in a silo. It needs access to the context of our work—and for most of us, that context is locked away in the cloud, in documents, chats, and calendars.

That’s why I built the Google Workspace extension for Gemini CLI.

Giving the Agent “Senses

We often talk about AI agents in the abstract, but their utility is defined by their boundaries. An agent that can only see your code is a great coding partner. An agent that can see your code and your design documents and your team’s chat history? That’s a teammate.

This extension connects the Gemini CLI to the Google Workspace APIs, effectively giving your terminal-based AI a set of digital senses and hands. It’s not just about reading data; it’s about integrating that data into your active workflow.

Here is what that looks like in practice:

1. Contextual Coding

Instead of copying and pasting requirements from a browser window, you can now ask Gemini to pull the context directly.

“Find the ‘Project Atlas Design Doc’ in Drive, read the section on API authentication, and help me scaffold the middleware based on those specs.”

2. Managing the Day

I often get lost in work and lose track of time. Now, I can simply ask my terminal:

“Check my calendar for the rest of the day. Do I have any blocks of free time longer than two hours to focus on this migration?”

3. Seamless Communication

Sometimes you just need to drop a quick note without leaving your environment.

“Send a message to the ‘Core Eng’ chat space letting them know the deployment is starting now.”

The Accidental Product

Truth be told, I didn’t set out to build a product. When I first joined Google DeepMind, this was simply my “starter project.” My manager suggested I spend a few weeks experimenting with Google Workspace and our agentic capabilities, and the Gemini CLI seemed like the perfect sandbox for that kind of exploration.

I started building purely for myself, guided by my own daily friction. I wanted to see if I could check my calendar without leaving the terminal. Then I wanted to see if I could pull specs from a Doc. I followed the path of my own curiosity, adding tools one by one.

But when I shared this little experiment with a few colleagues, the reaction was immediate. They didn’t just think it was cool; they wanted to install it. That’s when I realized this wasn’t just a personal hack—it was a shared need. It snowballed from a few scripts into a full-fledged extension that we knew we had to ship.

Under the Hood

The extension is built as a Model Context Protocol (MCP) server, which means it runs locally on your machine. It uses your own OAuth credentials, so your data never passes through a third-party server. It’s direct communication between your local CLI and the Google APIs.

It currently supports a wide range of tools across the Workspace suite:

  • Docs & Drive: Search for files, read content, and even create new docs from markdown.
  • Calendar: List events, find free time, and schedule meetings.
  • Gmail: Search threads, read emails, and draft replies.
  • Chat: Send messages and list spaces.

Why This Matters

This goes back to the idea of “Small Tools, Big Ideas.” Individually, a command-line tool to read a calendar isn’t revolutionary. But when you combine that capability with the reasoning engine of a large language model, it becomes something else entirely.

It turns your terminal into a cockpit for your entire digital work life. It allows you to script interactions between your code and your company’s knowledge base. It reduces the friction of context switching, letting you stay where you are most productive.

If you want to try it out, the extension is open source and available now. You can install it directly into the Gemini CLI:

gemini extensions install https://github.com/gemini-cli-extensions/workspace

I’m curious to see how you all use this. Does it change your workflow? Does it keep you in the flow longer? Give it a spin and let me know.

A developer leans back in his chair with hands behind his head, smiling with relief. His monitor displays a large glowing "DELETE" button. In the background, a messy, tangled server rack is fading away, symbolizing the removal of complex infrastructure.

The Joy of Deleting Code: Rebuilding My Podcast Memory

Late last year, I shared the story of a personal obsession: building an AI system grounded in my podcast history. I had hundreds of hours of audio—conversations that had shaped my thinking—trapped in MP3 files. I wanted to set them free. I wanted to be able to ask my library questions, find half-remembered quotes, and synthesize ideas across years of listening.

So, I built a system. And like many “v1” engineering projects, it was a triumph of brute force.

It was a classic Retrieval-Augmented Generation (RAG) pipeline, hand-assembled from the open-source parts bin. I had a reliable tool called podgrab acting as my scout, faithfully downloading every new episode. But downstream from that was a complex RAG implementation to chop transcripts into bite-sized chunks. I had an embedding model to turn those chunks into vectors. And sitting at the center of it all was a vector database (ChromaDB) that I had to host, manage, and maintain.

It worked, but it was fragile. I didn’t even have a proper deployment setup; I ran the whole thing from a tmux session, with different panes for the ingestion watcher, the vector database, and the API server. It felt like keeping a delicate machine humming by hand. Every time I wanted to tweak the retrieval logic or—heaven forbid—change the embedding model, I was looking at a weekend of re-indexing and refactoring. I had built a memory for my podcasts, but I had also built myself a part-time job as a database administrator.

Then, a few weeks ago, I saw this announcement from the Gemini team.

They were launching File Search, a tool that promised to collapse my entire precarious stack into a single API call. The promise was bold: a fully managed RAG system. No vector DB to manage. No manual chunking strategies to debate. No embedding pipelines to debug. You just upload the files, and the model handles the rest.

I remember reading the documentation and feeling that specific, electric tingle that hits you when you realize the “hard problem” you’ve been solving is no longer a hard problem. It wasn’t just an update; it was permission to stop doing the busy work. I was genuinely excited—not just to write new code, but to tear down the old stuff.

Sometimes, it’s actually more fun to delete code than it is to write it.

The first step was the migration. I wrote a script to push my archive—over 18,000 podcast transcripts—into the new system. It took a while to run, but when it finished, everything was just… there. Searchable. Grounded. Ready.

That was the signal I needed. I opened my editor and started deleting code I had painstakingly written just last year. Podgrab stayed—it was doing its job perfectly—but everything else was on the chopping block.

  • I deleted the chromadb dependency and the local storage management. Gone.
  • I deleted the custom logic for sliding-window text chunking. Gone.
  • I deleted the manual embedding generation code. Gone.
  • I deleted the old web app and a dozen stagnant prototypes that were cluttering up the repo. Gone.

I watched my codebase shrink by hundreds of lines. The complexity didn’t just move; it evaporated. It was more than just a cleanup; it was a chance for a fresh start with new assumptions and fewer constraints. I wasn’t patching an old system anymore; I was building a new one, unconstrained by the decisions I made a year ago.

In its place, I wrote a new, elegant ingestion script. It does one thing: it takes the transcripts generated from the files podgrab downloads and uploads them to the Gemini File Search store. That’s it. Google handles the indexing, the storage, and the retrieval.

With the heavy lifting gone, I was free to rethink the application itself. I built a new central brain for the project, a lightweight service I call mcp_server.py (implementing the Model Context Protocol).

Previously, my server was bogged down with the mechanics of how to find data. Now, mcp_server.py simply hands a user’s query to my rag.py module. That module doesn’t need to be a database client anymore; it just configures the Gemini FileSearch tool and gets out of the way. The model itself, grounded by the tool, does the retrieval, the synthesis, and even the citation.

The difference is profound. The “RAG” part of my application—the part that used to consume 80% of my engineering effort—is now just a feature I use, like a spell checker or a date parser.

This shift is bigger than my podcast project. It changes the calculus for every new idea I have. Previously, if I wanted to build a grounded AI tool for a different context—say, for my project notes or my email archives—I would hesitate. I’d think about the boilerplate, the database setup, the chunking logic. Now? I can spin up a robust, grounded system in an hour.

My podcast agent is smarter now, faster, and much cheaper to run. But the best part? I’m not a database administrator anymore. I’m just a builder again.

You can try out the new system yourself at podcast-rag.hutchison.org or check out the code on GitHub.

Abstract digital visualization of glowing lines and nodes converging on a central geometric shape labeled 'AGENTS.md', symbolizing interconnected AI systems and a unifying standard.

On Context, Agents, and a Path to a Standard

When we were first designing the Gemini CLI, one of the foundational ideas was the importance of context. For an AI to be a true partner in a software project, it can’t just be a stateless chatbot; it needs a “worldview” of the codebase it’s operating in. It needs to understand the project’s goals, its constraints, and its key files. This philosophy isn’t unique; many agentic tools use similar mechanisms. In our case, it led to the GEMINI.md context system (which was first introduced in this commit) a simple Markdown file that acts as a charter, guiding the AI’s behavior within a specific repository.

At its core, GEMINI.md is designed for clarity and flexibility. It gives developers a straightforward way to provide durable instructions and file context to the model. We also recognized that not every project is the same, so we made the system adaptable. For instance, if you prefer a different convention, you can easily change the name of your context file with a simple setting.

This approach has worked well, but I’ve always been mindful that bespoke solutions, however effective, can lead to fragmentation. In the open, collaborative world of software development, standards are the bridges that connect disparate tools into a cohesive ecosystem.

That’s why I’ve been following the emergence of the Agents.md specification with great interest. We have several open issues in the Gemini CLI repo (like #406 and #12345) from users asking for Agents.md support, so there’s clear community interest. The idea of a universal standard for defining an AI’s context is incredibly appealing. A shared format would mean that a context file written for one tool could work seamlessly in another, allowing developers to move between tools without friction. I would love for Gemini CLI to become a first-class citizen in that ecosystem.

However, as I’ve considered a full integration, I’ve run into a few hurdles—not just technical limitations, but patterns of use that a standard would need to address. This has led me to a more concrete set of proposals for what an effective standard would need.

So, what would it take to bridge this gap? I believe with a few key additions, Agents.md could become the robust standard we need. Here’s a more detailed breakdown of what I believe is required:

  1. A Standard for @file Includes: From my perspective, this is mandatory. In any large project, you need the ability to break down a monolithic context file into smaller, logical, and more manageable parts—much like a C/C++ #include. A simple @file directive, which GEMINI.md and some other systems support, would provide the modularity needed for real-world use.
  2. A Pragma System for Model-Specific Instructions: Developers will always want to optimize prompts for specific models. To accommodate this without sacrificing portability, the standard could introduce a pragma system. This could leverage standard Markdown callouts to tag instructions that only certain models should pay attention to, while others ignore them. For example:

    > [!gemini]
    > Gemini only instructions here

    > [!claude]
    > Claude only instructions here

    > [!codex]
    > Codex only instructions here
  3. Clear Direction on Context Hierarchy: We need clear rules for how an agentic application should discover and apply context. Based on my own work, I’d propose a hierarchical strategy. When an agent is invoked, it should read the context in its current directory and all parent directories. Then, when it’s asked to read a specific file, it should first apply the context from that file’s local directory before applying the broader, inherited context. This ensures that the most specific instructions are always considered first, creating a predictable and powerful system.

If the Agents.md standard were to incorporate these three features, I believe it would unlock a new level of interoperability for AI developer tools. It would create a truly portable and powerful way to define AI context, and I would be thrilled to move Gemini CLI to a model of first-class support.

The future of AI-assisted development is collaborative, and shared standards are the bedrock of that collaboration. I’ve begun outreach to the Agents.md maintainers to discuss these proposals, and I’m optimistic that with community feedback, we can get there. If you have your own opinions on this, I’d love to hear them in the discussion on our repo.

A cute cartoon purple bear mascot is on a golden ribbon with "Gemini Scribe" written on it. The background is a collage of two photos: the top half shows the Sydney Opera House at sunset, and the bottom half shows a laptop on a table by a pool with the ocean in the distance.

What I Did On My Summer Vacation

Every year, like clockwork, the first assignment back at school was the same: a short essay on what you did over the summer. It was a ritual of sorts, a gentle reentry into the world of homework and deadlines, usually accompanied by a gallery of crayon drawings of camping trips and beach outings.

My summer had all the makings of a classic entry. There was a trip to Australia and Fiji. I could write about the impossible blue of the water in the South Pacific, or the iconic silhouette of the Sydney Opera House against a setting sun. I have the photos to prove it. It was, by all accounts, a proper vacation.

But if I’m being honest, my most memorable trip wasn’t to a beach or a city. It was a two-week detour into the heart of my own code, building something that had been quietly nagging at me for months. While my family slept and the ocean hummed outside our window, I was on a different kind of adventure: one that took place entirely on my laptop, fueled by hotel coffee and a persistent idea I couldn’t shake. I was building an agent for Gemini Scribe.

The Genesis of an Idea

So why spend a vacation hunched over a keyboard? Because an idea was bothering me. The existing chat mode in Gemini Scribe was useful, but it was fundamentally limited. It operated on a simple, one-shot basis: you’d ask a question, and it would give you an answer. It was a powerful tool for quick queries or generating text, but it wasn’t a true partner in the writing process. It was like having a brilliant research assistant who had no short-term memory.

My work on the Gemini CLI was a huge part of this. As we described in our announcement post, we built the CLI to be a powerful, open-source AI agent for developers. It brings a conversational, tool-based experience directly to the terminal, and it’s brilliant at what it does. But its success made me wonder: what would an agent look like if it wasn’t built for a developer’s terminal, but for a writer’s notebook?

I imagined an experience that was less about executing discrete commands and more about engaging in a continuous, creative dialogue. The CLI is perfect for scripting and automation, but I wanted to build an agent that could handle the messy, iterative, and often unpredictable process of thinking and writing. I needed a sandbox to explore these ideas—a place to build and break things without disrupting the focused, developer-centric mission of the Gemini CLI.

Gemini Scribe was the perfect answer. It was my own personal lab. I wanted to be able to give it complex, multi-step tasks that mirrored how I actually work, like saying, “Read these three notes, find the common themes, and then use that to draft an outline in this new file.” With the old system, that was impossible. I was the human glue, copying and pasting, managing the context, and stitching together the outputs from a dozen different prompts. The AI was smart, but it couldn’t act.

It was this friction, this gap between what the tool was and what it could be, that I couldn’t let go of. It wasn’t just about adding a new feature; it was about fundamentally changing my relationship with the software. I didn’t want a tool I could command; I wanted a partner I could collaborate with. And so, with the Pacific as my backdrop, I started to build it.

A Creative Detour in Paradise

This wasn’t a frantic sprint. It was the opposite: a project defined by having the time and space to explore. Looking back at the commit history from July is like re-watching a time-lapse of a building being constructed, but one with very civilized hours. The work began in earnest on July 7th with the foundational architecture, built during the quiet early mornings in our Sydney hotel room while my family was still asleep.

A panoramic view of the Sydney skyline at sunset, featuring the Sydney Opera House and surrounding waterfront, with boats on the harbor and city lights beginning to illuminate.

By July 11th, the project had found its rhythm. That was the day the agent got its hands, with the first real tools like google_search and move_file. I remember a focused afternoon of debugging, patiently working through the stubborn formatting requirements of the Google AI SDK’s functionDeclarations. There was no rush, just the satisfying puzzle of getting it right.

Much of the user experience work happened during downtime. From a lounge chair by the beach in Fiji on July 15th, I implemented the @mention system to make adding files to the agent’s context feel more natural. I built a collapsible context panel and polished the session history, all with the freedom to put the laptop down whenever I got tired or frustrated.

A laptop displaying the word 'GEMINI' on its screen, placed on a wooden table with a view of the ocean and palm trees in the background.

Of course, some challenges required deeper focus. On July 16th, I had to build a LoopDetector—a crucial safety net to keep the agent from getting stuck in an infinite execution cycle. I remember wrestling with that logic while looking out over the ocean, a surreal but incredibly motivating environment. The following days were spent calmly adding session-level settings and permissions.

The final phase was about patiently testing and documenting. I wrote dozens of tests, updated the README, and fixed the small bugs that only reveal themselves through use. It was the process of turning a fun exploration into a polished, reliable feature. The first time I gave it a truly complex task—and watched it work, step-by-step, without a single hiccup—was the “aha!” moment. It felt like magic, born not from pressure, but from possibility.

What Agent Mode Really Is

So, what did all that creative exploration actually create? Agent Mode is a persistent, conversational partner for your writing. Instead of a one-off command, you now have a continuous session where the AI remembers what you’ve discussed and what it has done. It’s a research assistant and a writing partner rolled into one.

You can give it high-level goals, and it will figure out the steps to get there. It uses its tools to read your notes, search the web for new information, and even edit your files directly. When you give it a task, you can see its plan, watch it execute each step, and see the results in real-time.

It’s the difference between asking a librarian for a single book and having them join you at your table to help you research and write your entire paper. You can ask it to do things like, “Review my last three posts on AI, find the common threads, and draft an outline for a new post that combines those key themes.” Then you can watch it happen, all within your notes.

The Best Souvenirs

In the end, I came back with a tan and a camera roll full of beautiful photos. But the best souvenir from my trip was the one I built myself. For those of us who love to create, sometimes the most restorative thing you can do on a vacation is to find the time and space to build something you’re truly passionate about. It’s a reminder that the most exciting frontiers aren’t always on a map.

Agent Mode is now available in the latest version of Gemini Scribe. I’m incredibly excited about the new possibilities it opens up, and I can’t wait to see what you do with it. Please give it a try, and come join the conversation on GitHub to share your feedback and ideas. I’d love to hear what you think.

A cheerful, cartoon-style purple bear with a large head and big eyes is sitting at a desk, happily using a computer with a text editor open on the screen. A section of the text is highlighted.

A More Precise Way to Rewrite in Gemini Scribe

I’ve been remiss in posting updates, but I wanted to take a moment to highlight a significant enhancement to Gemini Scribe that streamlines the writing and editing process: the selection-based rewrite feature. This powerful tool replaced the previous full-file rewrite functionality, offering a more precise, intuitive, and safer way to collaborate with AI on your documents.

What’s New?

Instead of rewriting an entire file, you can now select any portion of your text and have the AI rewrite just that part based on your instructions. Whether you need to make a paragraph more concise, fix grammar in a sentence, or change the tone of a section, this new feature gives you surgical precision.

How It Works

Using the new feature is simple:

  1. Select the text you want to rewrite in your editor.
  2. Right-click on the selection and choose “Rewrite with Gemini” from the context menu, or trigger the command from the command palette.
  3. A dialog will appear showing you the selected text and asking for your instructions.
  4. Type in what you want to change (e.g., “make this more formal,” “simplify this concept,” or “fix spelling and grammar”), and the AI will get to work.
  5. The selected text is then replaced with the AI-generated version, while the rest of your document remains untouched.

Behind the scenes, the plugin sends the full content of your note to the AI for context, with special markers indicating the selected portion. This allows the AI to maintain the style, tone, and flow of your document, ensuring the rewritten text fits in seamlessly.

Why This is Better

The previous rewrite feature was an all-or-nothing affair, which could sometimes lead to unexpected changes or loss of content. This new selection-based approach is a major improvement for several reasons:

  • Precision and Control: You have complete control over what gets rewritten, down to a single word.
  • Safety: There’s no risk of accidentally overwriting parts of your document you wanted to keep.
  • Iterative Workflow: It encourages a more iterative and collaborative workflow. You can refine your document section by section, making small, incremental improvements.
  • Speed and Efficiency: It’s much faster to rewrite a small selection than an entire document, making the process more interactive and fluid.

This new feature is designed to feel like a natural extension of the editing process, making AI-assisted writing more of a partnership.

A Note on the ‘Rewrite’ Checkbox

I’ve received some feedback about the removal of the “rewrite” checkbox from the normal mode. I want to thank you for that feedback and address it directly. There are a couple of key reasons why I decided to remove this feature in favor of the new selection-based rewriting.

First, I found it difficult to get predictable results with the old mechanism. The model would sometimes overwrite the entire file unexpectedly, which made the feature unreliable and risky to use. I personally rarely used it for this reason.

Second, the new Agent Mode provides a much more reliable way to replicate the old functionality. If you want to rewrite an entire file, you can simply add the file to your Agent session and describe the changes you want the AI to make. The Agent will then edit the entire file for you, giving you a more controlled and predictable outcome.

While I understand that change can be disruptive, I’m confident that the new selection-based rewriting and the Agent Mode offer a superior and safer experience. I’m always looking for ways to improve the plugin, so please continue to share your thoughts and feedback on how you’re using the new features.

The Future is Agent-ic

Ultimately, over the next several iterations of Gemini Scribe, I’ll be moving more and more functionality to the Agent Mode and merging the experience from the existing Gemini Chat Mode into the Agent. I’m hoping that this addresses a lot of feedback I’ve received over the last nine months for this plugin and creates something that is even more powerful for interacting with your notes. More on Agent Mode in a coming post.

I’m really excited about this new direction for Gemini Scribe, and I believe it will make the plugin an even more powerful tool for writers and note-takers. Please give it a try and let me know what you think!

Gemini Scribe Supercharged: A Faster, More Powerful Workflow Awaits

It’s been a little while since I last wrote about Gemini Scribe, and that’s because I’ve been deep in the guts of the plugin, tearing things apart and putting them back together in ways that make the whole experience faster, smoother, and just plain better.

One of the first things that pushed me back into the code was the rhythm of the interaction itself. Every time I typed a prompt and hit enter, I found myself waiting—watching the spinner, watching the time pass, watching the thought in my head cool off while the AI gathered its response. It didn’t feel like a conversation. It felt like submitting a form.

That’s fixed now. As of version 2.2.0, Gemini Scribe streams responses in real-time. You see the words as they’re generated, line by line, without the long pause in between. It makes a difference. The back-and-forth becomes more fluid, more natural. It pulls you into the interaction rather than holding you at arm’s length. And once I started using it this way, I couldn’t go back.

But speed was only part of it. I also wanted more control. I’ve been using custom prompts more and more in my own workflow—not just as one-off instructions, but as reusable templates for different kinds of writing tasks. And the old prompt system, while functional, wasn’t built for that kind of use.

So I rewrote it.

Version 3.0.0 introduces a completely revamped custom prompt system. You can now create and manage your prompts right from the Command Palette. That means no more hunting through settings or copying from other notes—just hit the hotkey, type what you need, and move on. Prompts are now tracked in your chat history too, so you can always see exactly what triggered a particular response. It’s a small thing, but it brings a kind of transparency to the process that I’ve found surprisingly useful.

All of this is sitting on top of a much sturdier foundation than before. A lot of the internal work in these recent releases has been about making Gemini Scribe more stable and more integrated with the rest of the Obsidian ecosystem. Instead of relying on low-level file operations, the plugin now uses the official Obsidian APIs for everything. That shift makes it more compatible with other plugins and more resilient overall. The migration from the old system happens automatically in the background—you shouldn’t even notice it, except in the way things just work better.

There’s also a new “Advanced Settings” panel for those who like to tinker. In version 3.1.0, I added dynamic model introspection, which means Gemini Scribe now knows what the model it’s talking to is actually capable of. If you’re using a Gemini model that supports temperature or top-p adjustments, the plugin will surface those controls and tune their ranges appropriately. Defaults are shown, sliders are adjusted per-model, and you get more precise control without the guesswork.

None of these changes happened overnight. They came out of weeks of using the plugin, noticing friction, and wondering how to make things feel lighter. I’ve also spent a fair bit of time fixing bugs, adding retry logic for occasional API hiccups, and sanding off the rough edges that show up only after hours of use. This version is faster, smarter, and more comfortable to live in.

There’s still more to come. Now that the architecture is solid and the foundation is in place, I’m starting to explore ways to make Gemini Scribe even more integrated with your notes—tighter context handling, more intelligent follow-ups, and better tools for shaping long-form writing. But that’s a story for another day.

For now, if you’ve been using Gemini Scribe, update to the latest version from the community plugins tab and try out the new features. And if you’ve got ideas, feedback, or just want to follow along as things evolve, come join the conversation on GitHub. I’d love to hear what you think.