A central glowing geometric blue-white sphere, representing an AI brain, with five white outlined icons orbiting it. The icons represent a wrench (tools), a magnifying glass over a document (search/knowledge), a calendar with a checkmark (scheduling), a bar graph (data/execution), and a computer cursor over a window (digital interaction).

An Agent’s Toolkit

Welcome back to The Agentic Shift, our shared journey mapping the evolution of AI from passive creator to active partner. So far, we’ve carefully assembled the core components of our agent. We’ve given it senses to perceive its digital world, a brain to think and reason, and a memory to learn and recall.

Our agent is now a brilliant observer, but an observer is all it is. It can understand its environment, formulate complex plans, and remember every detail, but there’s a crucial piece missing. It’s like a master chef who has conceptualized the perfect dish but has no knives to chop or stove to cook. An agent that can only perceive, think, and remember is still trapped in its own mind. To be useful, it must be able to act.

This is where tools come in. Tools are the agent’s hands, allowing it to bridge the gap between its internal reasoning and the external digital world. In this post, we’ll finally step into the workshop and give our agent the ability to interact with its environment. We’ll explore the fundamental loop that governs its actions, the art of crafting a tool it can understand, and the common implements that empower agents to help with everything from coding to scheduling your next meeting.

The Suggestion, Not the Command

Before we break down the loop that governs tool use, we need to internalize the single most important concept in building safe agents: the AI model never executes code directly. This is a bright red line, a fundamental safety principle. When a model “uses a tool,” it isn’t running a program; it’s generating a highly structured piece of text—a suggestion—that our application code can choose to act upon.

Let’s return to our analogy of the master chef. The chef (the LLM) decides it’s time to sear the scallops. They don’t walk over to the stove and turn it on themselves. Instead, they call out to a trusted kitchen assistant (our application code), “Set the front burner to high heat.”

That verbal command is the tool call. It contains a clear intent (set_burner_heat) and specific parameters (burner: 'front', setting: 'high').

It’s the kitchen assistant’s job to interpret this command, walk over to the physical stove, and turn the knob. The assistant then reports back, “The burner is on and heating up.” With this new observation from the outside world, the chef can proceed to the next step in the recipe. The power lies in this clean separation of duties: the chef has the creative intelligence, but the assistant has the hands-on ability to interact with the world. In AI agents, this separation is how we maintain control, security, and reliability. The LLM suggests, and our application executes.

The Four-Step Recipe

At its heart, an agent’s ability to use a tool follows a simple, elegant recipe. It’s a dance between the AI’s brain (the LLM) and the application code that hosts it, a programmatic loop that follows a “Think-Act-Observe” cycle. Because our chef only suggests the next step, the kitchen assistant is always in control of the execution, making the entire process safe and reliable.

This recipe has four key steps:

  1. Provide Tools and a Prompt: The application gives the LLM the user’s request, but it also provides a “menu” of available tools, complete with detailed descriptions of what each one does.
  2. Get a Tool Call Suggestion: The LLM analyzes the request and the menu. If it decides a tool is needed, it fills out a structured “order form” (a FunctionCall) specifying which tool to use and what arguments to provide.
  3. Execute the Tool: Our application receives this order form, validates it, and then—in its own secure workshop—executes the actual function.
  4. Return the Result: The application takes the result from the tool and hands it back to the LLM, allowing it to synthesize a final, factually grounded answer for the user.

This loop transforms the agent from a pure conversationalist into a system that can take concrete, observable actions to achieve a goal.

When the Recipe Goes Wrong

The four-step recipe describes the ideal path, but in the real world, kitchens are messy. What happens when the kitchen assistant tries to light the stove and the gas is out? A good assistant doesn’t just stop; they report the problem back to the chef.

This is the essence of error handling in AI agents. If our application tries to execute a tool and it fails—perhaps an external API is down or a file isn’t found—it’s crucial that it doesn’t crash. Instead, it should catch the error and pass a clear, descriptive error message back to the model as the “observation” in the next step of the loop.

When the LLM receives an error message (e.g., “Error: API timed out”), it can use its reasoning ability to decide what to do next. It might suggest retrying the tool, trying a different one, or simply informing the user that it can’t complete the request. This is what transforms an agent from a fragile automaton into a resilient problem-solver.

The Multi-Tasking Chef

As agents become more sophisticated, so does their ability to multitask. Modern LLMs can suggest calling multiple tools at the same time, like a master chef telling an assistant, “Start searing the scallops and begin chopping the parsley.”

If a user asks a complex question like, “What’s the weather in London and what’s the top news story in Paris?” a capable agent can recognize these are two separate, independent tasks. In its “order form,” it can list two tool calls: one for the weather API and one for a news API. Our application can then execute these two calls concurrently. This parallel execution is far more efficient than a one-at-a-time approach, leading to faster responses and a more fluid user experience.

Following the Whole Recipe

The true power of an agent is revealed when it moves beyond single commands and starts executing a whole recipe, one step at a time. This is called tool chaining, and it’s how agents tackle complex, multi-step tasks. The core idea is simple: the output from one tool becomes the input for the next.

This is achieved by running the “Think-Act-Observe” loop multiple times. Consider a request like, “Find the latest project update email from Jane, summarize it, and send a ‘Got it, thanks!’ reply.” An agent would tackle this by chaining tools together:

  1. Loop 1: The agent first calls the search_email tool with the query “latest project update from Jane.” The tool returns the full text of the email.
  2. Loop 2: With the email content now in its context, the agent’s next thought is to summarize it. It calls a summarize_text tool, passing in the email’s content. The tool returns a concise summary.
  3. Loop 3: Now, holding the summary, the agent knows it just needs to confirm receipt. It calls the send_email tool, with the recipient set to Jane and the body as “Got it, thanks!”

This ability to chain actions—where each step informs the next—is what elevates an agent from a simple command-executor to a true problem-solver.

Giving the Agent a Menu

An agent can only use the tools it understands. This is why defining a tool correctly is one of the most critical aspects of building a reliable agent. A well-defined tool is a contract between our code and the model’s reasoning, and it has three parts:

  • Name: A unique, simple identifier, like get_current_weather.
  • Description: A clear, natural language explanation of what the tool does. This is the most important part, as the description is effectively the prompt for the tool; a vague description will lead to a confused agent.
  • Schema: A rigid, machine-readable contract that defines the exact parameters the tool needs, their data types, and which ones are required. This schema is a powerful guardrail against the model “hallucinating” or inventing parameters that don’t exist.

Let’s look at a practical example from the Google Gemini API, defining a tool to get the weather.

# Based on official Google AI for Developers Documentation
from google import genai
from google.genai import types

# Define the function declaration for the model
weather_function = {
    "name": "get_current_temperature",
    "description": "Gets the current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city name, e.g. San Francisco",
            },
        },
        "required": ["location"],
    },
}

# Configure the client and tools
# The 'client' part is a placeholder for your actual Gemini client.
tools = types.Tool(function_declarations=[weather_function])
config = types.GenerateContentConfig(tools=[tools])

# Send request with function declarations
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the temperature in London?",
    config=config,
)

# Check for a function call
if response.candidates[0].content.parts[0].function_call:
    function_call = response.candidates[0].content.parts[0].function_call
    print(f"Function to call: {function_call.name}")
    print(f"Arguments: {function_call.args}")
    #  In a real app, you would call your function here:
    #  result = get_current_temperature(**function_call.args)
else:
    print("No function call found in the response.")
    print(response.text)

You can find this complete example in the official documentation. Here, we define the tool’s contract—its name, description, and schema—using a standard Python dictionary. This is then passed to the model, which can now intelligently decide when and how to ask for the temperature. The final part of the example shows how our application can inspect the model’s response to see if it suggested a function call. If it did, our code can then take that suggestion and execute the real-world function, completing the loop.

The Mechanic’s Workbench

So, how many tools can an agent have? While modern models like Gemini 2.5 Flash have enormous context windows that can technically accommodate hundreds of tool definitions, this is misleading. The true bottleneck isn’t just about fitting the tools into the context; it’s about the model’s ability to reason effectively when faced with an overwhelming number of choices.

Think of it like a mechanic’s workbench. A massive bench can hold every tool imaginable, but if it’s cluttered with dozens of nearly identical wrenches, the mechanic will waste time and is more likely to grab the wrong one. The problem isn’t a lack of space; it’s the cognitive load of making the right choice. A smaller, well-organized bench with only the tools needed for the job is far more efficient.

Giving a model too many tools creates a similar “paradox of choice,” leading to ambiguity and reduced accuracy. The signal of the correct tool can get lost in the noise of all the possible tools. The best practice is to be judicious. Equip your agent with a focused set of high-quality, distinct tools relevant to its core purpose.

Tools of the Trade

While you can create a tool for almost any API, a few common patterns have emerged, often tailored to the type of work the agent is designed for.

For the Coder

Agents designed to help with software development need tools to interact with their environment just like a human developer would. These include File System Tools (read_file, write_file) to read and modify code, and the powerful Shell Command (execute_shell_command), which allows an agent to run terminal commands to do things like install dependencies or run tests.

For the Knowledge Worker

Agents for productivity and office work focus on automating communication, scheduling, and information retrieval. Common tools include an Email tool to send messages, a Calendar tool to parse fuzzy requests and create events, and a Document Search tool to perform semantic search over a private knowledge base.

Building Trust

As we grant agents more powerful tools—from sending emails to executing code—building in layers of trust and safety becomes paramount. Giving an agent the key to a powerful tool without oversight is like hiring a new kitchen assistant and not checking their work. Three critical patterns for building this trust are:

  • Sandboxing: This ensures that powerful tools run in a secure, isolated environment where they can’t affect the broader system.
  • The Human-in-the-Loop: This pattern requires user confirmation for sensitive actions. The agent formulates a plan and waits for a “yes” before executing it.
  • Policy Engines: This is a more automated form of oversight—a set of programmable rules the application checks before executing a tool, like a policy that an agent is never allowed to delete more than five files at a time.

These concepts are so foundational to building responsible agents that we’ll be dedicating a future post in this series, Part 6: Putting Up the Guardrails, to a much deeper exploration.

From Queries to Workflows

Taking a step back, it’s worth reflecting on the profound shift that tools represent. For years, our primary interaction with large language models was conversational—we would ask a question, and the model would answer using the vast but static knowledge it was trained on. It was a dialogue.

Tools are the mechanism that transforms this dialogue into a workflow. They are what allow us to move from asking “How do I book a flight from SFO to JFK?” to simply saying, “Book me a flight from SFO to JFK.” The focus shifts from information retrieval to task completion. This is the essence of the agentic shift: moving from a model that knows to an agent that does.

Conclusion

Tools are the bridge between the agent’s mind and the world. They are what allow it to move from reasoning to acting, transforming it from a passive information source into an active partner. The simple, secure loop of “Think-Act-Observe” is the engine that drives this process, and the clarity with which we define an agent’s tools is what determines its reliability.

Now that our agent has a body, a brain, a memory, and hands, the next question is: how do we guide its behavior? How do we write the high-level instructions that shape its personality, its goals, and the constraints under which it operates? That’s the art of the system prompt, and it’s exactly where we’re heading in Part 5: Guiding the Agent’s Behavior. The foundation is laid; now we can start to bring our agent to life.

A cute cartoon purple bear mascot is on a golden ribbon with "Gemini Scribe" written on it. The background is a collage of two photos: the top half shows the Sydney Opera House at sunset, and the bottom half shows a laptop on a table by a pool with the ocean in the distance.

What I Did On My Summer Vacation

Every year, like clockwork, the first assignment back at school was the same: a short essay on what you did over the summer. It was a ritual of sorts, a gentle reentry into the world of homework and deadlines, usually accompanied by a gallery of crayon drawings of camping trips and beach outings.

My summer had all the makings of a classic entry. There was a trip to Australia and Fiji. I could write about the impossible blue of the water in the South Pacific, or the iconic silhouette of the Sydney Opera House against a setting sun. I have the photos to prove it. It was, by all accounts, a proper vacation.

But if I’m being honest, my most memorable trip wasn’t to a beach or a city. It was a two-week detour into the heart of my own code, building something that had been quietly nagging at me for months. While my family slept and the ocean hummed outside our window, I was on a different kind of adventure: one that took place entirely on my laptop, fueled by hotel coffee and a persistent idea I couldn’t shake. I was building an agent for Gemini Scribe.

The Genesis of an Idea

So why spend a vacation hunched over a keyboard? Because an idea was bothering me. The existing chat mode in Gemini Scribe was useful, but it was fundamentally limited. It operated on a simple, one-shot basis: you’d ask a question, and it would give you an answer. It was a powerful tool for quick queries or generating text, but it wasn’t a true partner in the writing process. It was like having a brilliant research assistant who had no short-term memory.

My work on the Gemini CLI was a huge part of this. As we described in our announcement post, we built the CLI to be a powerful, open-source AI agent for developers. It brings a conversational, tool-based experience directly to the terminal, and it’s brilliant at what it does. But its success made me wonder: what would an agent look like if it wasn’t built for a developer’s terminal, but for a writer’s notebook?

I imagined an experience that was less about executing discrete commands and more about engaging in a continuous, creative dialogue. The CLI is perfect for scripting and automation, but I wanted to build an agent that could handle the messy, iterative, and often unpredictable process of thinking and writing. I needed a sandbox to explore these ideas—a place to build and break things without disrupting the focused, developer-centric mission of the Gemini CLI.

Gemini Scribe was the perfect answer. It was my own personal lab. I wanted to be able to give it complex, multi-step tasks that mirrored how I actually work, like saying, “Read these three notes, find the common themes, and then use that to draft an outline in this new file.” With the old system, that was impossible. I was the human glue, copying and pasting, managing the context, and stitching together the outputs from a dozen different prompts. The AI was smart, but it couldn’t act.

It was this friction, this gap between what the tool was and what it could be, that I couldn’t let go of. It wasn’t just about adding a new feature; it was about fundamentally changing my relationship with the software. I didn’t want a tool I could command; I wanted a partner I could collaborate with. And so, with the Pacific as my backdrop, I started to build it.

A Creative Detour in Paradise

This wasn’t a frantic sprint. It was the opposite: a project defined by having the time and space to explore. Looking back at the commit history from July is like re-watching a time-lapse of a building being constructed, but one with very civilized hours. The work began in earnest on July 7th with the foundational architecture, built during the quiet early mornings in our Sydney hotel room while my family was still asleep.

A panoramic view of the Sydney skyline at sunset, featuring the Sydney Opera House and surrounding waterfront, with boats on the harbor and city lights beginning to illuminate.

By July 11th, the project had found its rhythm. That was the day the agent got its hands, with the first real tools like google_search and move_file. I remember a focused afternoon of debugging, patiently working through the stubborn formatting requirements of the Google AI SDK’s functionDeclarations. There was no rush, just the satisfying puzzle of getting it right.

Much of the user experience work happened during downtime. From a lounge chair by the beach in Fiji on July 15th, I implemented the @mention system to make adding files to the agent’s context feel more natural. I built a collapsible context panel and polished the session history, all with the freedom to put the laptop down whenever I got tired or frustrated.

A laptop displaying the word 'GEMINI' on its screen, placed on a wooden table with a view of the ocean and palm trees in the background.

Of course, some challenges required deeper focus. On July 16th, I had to build a LoopDetector—a crucial safety net to keep the agent from getting stuck in an infinite execution cycle. I remember wrestling with that logic while looking out over the ocean, a surreal but incredibly motivating environment. The following days were spent calmly adding session-level settings and permissions.

The final phase was about patiently testing and documenting. I wrote dozens of tests, updated the README, and fixed the small bugs that only reveal themselves through use. It was the process of turning a fun exploration into a polished, reliable feature. The first time I gave it a truly complex task—and watched it work, step-by-step, without a single hiccup—was the “aha!” moment. It felt like magic, born not from pressure, but from possibility.

What Agent Mode Really Is

So, what did all that creative exploration actually create? Agent Mode is a persistent, conversational partner for your writing. Instead of a one-off command, you now have a continuous session where the AI remembers what you’ve discussed and what it has done. It’s a research assistant and a writing partner rolled into one.

You can give it high-level goals, and it will figure out the steps to get there. It uses its tools to read your notes, search the web for new information, and even edit your files directly. When you give it a task, you can see its plan, watch it execute each step, and see the results in real-time.

It’s the difference between asking a librarian for a single book and having them join you at your table to help you research and write your entire paper. You can ask it to do things like, “Review my last three posts on AI, find the common threads, and draft an outline for a new post that combines those key themes.” Then you can watch it happen, all within your notes.

The Best Souvenirs

In the end, I came back with a tan and a camera roll full of beautiful photos. But the best souvenir from my trip was the one I built myself. For those of us who love to create, sometimes the most restorative thing you can do on a vacation is to find the time and space to build something you’re truly passionate about. It’s a reminder that the most exciting frontiers aren’t always on a map.

Agent Mode is now available in the latest version of Gemini Scribe. I’m incredibly excited about the new possibilities it opens up, and I can’t wait to see what you do with it. Please give it a try, and come join the conversation on GitHub to share your feedback and ideas. I’d love to hear what you think.

An antique-style fantasy map titled "The Journey of Innovation." It shows a winding, dashed red line charting a complex path through conceptual territories like "The Mountains of Code," "The Sea of Management," and "The Startup Archipelago." The path ends very near its starting point, illustrating a full-circle journey.

Full Circle

My calendar looks different these days. The back-to-back blocks of 1:1s, strategy reviews, and planning sessions have given way to long, uninterrupted stretches of quiet. That quiet has been the most significant change—it’s brought back time to think, a noticeable drop in stress, and a genuine enjoyment in my work that I hadn’t realized was fading. It’s why, after years of leading teams, I’ve deliberately moved back to a role as an individual contributor.

This shift has changed my day-to-day work, but one thing that remains constant is the time I spend mentoring colleagues and contacts, helping them navigate their own career questions. In those conversations, my own journey often comes up, and I hear a familiar question: “You were leading large teams… why the change?” Some have even wondered if I was leaving the company (I’m not). It’s a question with more than one answer, and I realized this post is my way of exploring them fully—for everyone who has asked, and for anyone else thinking about their own path.

It’s a fair question, and the simple answer is that my career has always been guided by a desire to learn and experience things more deeply. It’s never been a straight line up the leadership ladder; I’ve moved between managing and building several times. Each shift was a deliberate choice to go where I felt I could learn the most. This recent move—from a Senior Director role in Cloud AI to a Distinguished Engineer in Google DeepMind—is just the latest example of that pattern: a deliberate step toward the work that feels most urgent and exciting right now.

That motivation started early. My move from Indiana University to Cisco wasn’t just for a job; it was to understand what Silicon Valley was really about. When the dot-com bubble burst, I saw it as a chance to experience something new and jumped into the startup world, working on the foundational tech for what would become the 802.11n and 802.11s WiFi standards. I was learning a ton, but I knew my growth had plateaued. That’s when a friend asked me to consider Google. It was October 2004, just after the IPO, and Google seemed like a magical place. I said yes without knowing what team I’d join. I just wanted to see what it was all about.

My Google journey began in March of 2005 on the municipal WiFi project in Mountain View, but soon took me to London as one of our first engineers in that office. After building out the test engineering team, I moved into Ads and had my first real chance to work with machine learning at Google, working on systems for multivariate ad optimization. From there, I moved back to the US and eventually found my way to Google Maps and Street View.

That was a dream job. I spent nearly a decade in Geo, starting on a team of two working on the launch pipeline and serving infrastructure. Over time, my responsibilities grew, and I had the privilege of leading teams working on everything from the “time machine” feature for historic imagery to 3D reconstruction, imaging hardware, machine learning, and augmented reality. Through it all, I had the chance to learn, explore, and contribute alongside people who became some of my dearest friends.

In 2019, a different kind of challenge appeared. My manager was asked to build a new product area, and I offered to help as his Chief of Staff. I wanted to learn how Google was managed as a business—how decisions were made and how organizations were designed at a macro scale. After two years in that role, I moved back into a technology leadership role, helping with the formation of Core ML.

It was after all of this that I started to realize something important: I missed having my own technical contributions. I missed the flow state, that feeling of time dissolving as you wrestle with a complex problem. I missed the direct feedback loop of writing a piece of code, running it, and seeing it work. I wanted to build my own ideas again.

That feeling connected directly back to my college days. I was an AI major at Indiana University in the 90s, and throughout my career, I had kept coming back to machine learning—in Ads, in Geo, in Core ML. With the explosion of generative AI in 2022, I knew exactly where I wanted to spend my time. More than anything, I wanted to apply these powerful new models to solve real-world problems.

This led me to the ML Developer team in Cloud, leading the Kaggle, Colab, and Gemini API teams. It was a smaller team with a mature leadership bench, which gave me more time to build my own projects—many of which have been chronicled on this blog. As the team evolved, I began contributing to internal projects as well, which culminated in the launch of Gemini CLI, where I was one of the core contributors from the beginning.

Working on Gemini CLI, I realized I was finally doing the exact kind of work I had been craving. When an opportunity came up to move to Google DeepMind and focus full-time on AI Agents and the future of Gemini CLI, I knew it was the right next step.

People often ask me why I’ve been at Google for over 20 years. The answer is simple: it has always been a place of discovery. It’s had its ups and downs, of course. There have been times I’ve considered leaving and times I’ve disliked my situation. But I’ve been lucky enough to move around and keep things fresh, working on projects in mobile, search, maps, technical infrastructure, cloud, and AI. Where else can you get exposed to so much in one place? The fantastic Acquired podcast is currently doing a series on Google (1, 2), and hearing those stories reminded me of how fortunate I’ve been to occasionally get a preview of the future. While a journey like this requires hard work, it also requires being in the right place at the right time. Right now, I feel like I’m in the perfect place for whatever comes next.

This move isn’t just about returning to code. It’s about being in the driver’s seat for the next evolution of software development, where our primary collaboration is with intelligent agents. For a builder, there’s no more exciting place to be. I’m home.

A cheerful, cartoon-style purple bear with a large head and big eyes is sitting at a desk, happily using a computer with a text editor open on the screen. A section of the text is highlighted.

A More Precise Way to Rewrite in Gemini Scribe

I’ve been remiss in posting updates, but I wanted to take a moment to highlight a significant enhancement to Gemini Scribe that streamlines the writing and editing process: the selection-based rewrite feature. This powerful tool replaced the previous full-file rewrite functionality, offering a more precise, intuitive, and safer way to collaborate with AI on your documents.

What’s New?

Instead of rewriting an entire file, you can now select any portion of your text and have the AI rewrite just that part based on your instructions. Whether you need to make a paragraph more concise, fix grammar in a sentence, or change the tone of a section, this new feature gives you surgical precision.

How It Works

Using the new feature is simple:

  1. Select the text you want to rewrite in your editor.
  2. Right-click on the selection and choose “Rewrite with Gemini” from the context menu, or trigger the command from the command palette.
  3. A dialog will appear showing you the selected text and asking for your instructions.
  4. Type in what you want to change (e.g., “make this more formal,” “simplify this concept,” or “fix spelling and grammar”), and the AI will get to work.
  5. The selected text is then replaced with the AI-generated version, while the rest of your document remains untouched.

Behind the scenes, the plugin sends the full content of your note to the AI for context, with special markers indicating the selected portion. This allows the AI to maintain the style, tone, and flow of your document, ensuring the rewritten text fits in seamlessly.

Why This is Better

The previous rewrite feature was an all-or-nothing affair, which could sometimes lead to unexpected changes or loss of content. This new selection-based approach is a major improvement for several reasons:

  • Precision and Control: You have complete control over what gets rewritten, down to a single word.
  • Safety: There’s no risk of accidentally overwriting parts of your document you wanted to keep.
  • Iterative Workflow: It encourages a more iterative and collaborative workflow. You can refine your document section by section, making small, incremental improvements.
  • Speed and Efficiency: It’s much faster to rewrite a small selection than an entire document, making the process more interactive and fluid.

This new feature is designed to feel like a natural extension of the editing process, making AI-assisted writing more of a partnership.

A Note on the ‘Rewrite’ Checkbox

I’ve received some feedback about the removal of the “rewrite” checkbox from the normal mode. I want to thank you for that feedback and address it directly. There are a couple of key reasons why I decided to remove this feature in favor of the new selection-based rewriting.

First, I found it difficult to get predictable results with the old mechanism. The model would sometimes overwrite the entire file unexpectedly, which made the feature unreliable and risky to use. I personally rarely used it for this reason.

Second, the new Agent Mode provides a much more reliable way to replicate the old functionality. If you want to rewrite an entire file, you can simply add the file to your Agent session and describe the changes you want the AI to make. The Agent will then edit the entire file for you, giving you a more controlled and predictable outcome.

While I understand that change can be disruptive, I’m confident that the new selection-based rewriting and the Agent Mode offer a superior and safer experience. I’m always looking for ways to improve the plugin, so please continue to share your thoughts and feedback on how you’re using the new features.

The Future is Agent-ic

Ultimately, over the next several iterations of Gemini Scribe, I’ll be moving more and more functionality to the Agent Mode and merging the experience from the existing Gemini Chat Mode into the Agent. I’m hoping that this addresses a lot of feedback I’ve received over the last nine months for this plugin and creates something that is even more powerful for interacting with your notes. More on Agent Mode in a coming post.

I’m really excited about this new direction for Gemini Scribe, and I believe it will make the plugin an even more powerful tool for writers and note-takers. Please give it a try and let me know what you think!

Gemini Scribe Supercharged: A Faster, More Powerful Workflow Awaits

It’s been a little while since I last wrote about Gemini Scribe, and that’s because I’ve been deep in the guts of the plugin, tearing things apart and putting them back together in ways that make the whole experience faster, smoother, and just plain better.

One of the first things that pushed me back into the code was the rhythm of the interaction itself. Every time I typed a prompt and hit enter, I found myself waiting—watching the spinner, watching the time pass, watching the thought in my head cool off while the AI gathered its response. It didn’t feel like a conversation. It felt like submitting a form.

That’s fixed now. As of version 2.2.0, Gemini Scribe streams responses in real-time. You see the words as they’re generated, line by line, without the long pause in between. It makes a difference. The back-and-forth becomes more fluid, more natural. It pulls you into the interaction rather than holding you at arm’s length. And once I started using it this way, I couldn’t go back.

But speed was only part of it. I also wanted more control. I’ve been using custom prompts more and more in my own workflow—not just as one-off instructions, but as reusable templates for different kinds of writing tasks. And the old prompt system, while functional, wasn’t built for that kind of use.

So I rewrote it.

Version 3.0.0 introduces a completely revamped custom prompt system. You can now create and manage your prompts right from the Command Palette. That means no more hunting through settings or copying from other notes—just hit the hotkey, type what you need, and move on. Prompts are now tracked in your chat history too, so you can always see exactly what triggered a particular response. It’s a small thing, but it brings a kind of transparency to the process that I’ve found surprisingly useful.

All of this is sitting on top of a much sturdier foundation than before. A lot of the internal work in these recent releases has been about making Gemini Scribe more stable and more integrated with the rest of the Obsidian ecosystem. Instead of relying on low-level file operations, the plugin now uses the official Obsidian APIs for everything. That shift makes it more compatible with other plugins and more resilient overall. The migration from the old system happens automatically in the background—you shouldn’t even notice it, except in the way things just work better.

There’s also a new “Advanced Settings” panel for those who like to tinker. In version 3.1.0, I added dynamic model introspection, which means Gemini Scribe now knows what the model it’s talking to is actually capable of. If you’re using a Gemini model that supports temperature or top-p adjustments, the plugin will surface those controls and tune their ranges appropriately. Defaults are shown, sliders are adjusted per-model, and you get more precise control without the guesswork.

None of these changes happened overnight. They came out of weeks of using the plugin, noticing friction, and wondering how to make things feel lighter. I’ve also spent a fair bit of time fixing bugs, adding retry logic for occasional API hiccups, and sanding off the rough edges that show up only after hours of use. This version is faster, smarter, and more comfortable to live in.

There’s still more to come. Now that the architecture is solid and the foundation is in place, I’m starting to explore ways to make Gemini Scribe even more integrated with your notes—tighter context handling, more intelligent follow-ups, and better tools for shaping long-form writing. But that’s a story for another day.

For now, if you’ve been using Gemini Scribe, update to the latest version from the community plugins tab and try out the new features. And if you’ve got ideas, feedback, or just want to follow along as things evolve, come join the conversation on GitHub. I’d love to hear what you think.

Unlocking the Future of Coding: Introducing the Gemini CLI

Back in April, I wrote about waiting for the true AI coding partner. I articulated a vision for an AI that transcends mere code generation, one that truly understands context, acts autonomously within our development environments, and collaborates with us iteratively. Today, I’m thrilled to announce a significant step towards that vision: the launch of the Gemini CLI.

For too long, AI coding assistance has often felt like a disconnected assistant. While dedicated AI-powered IDEs like Cursor have made great strides, the common experience still involves copy-pasting code into a separate interface or breaking flow to get suggestions. This breaks flow, loses context, and frankly, isn’t how truly collaborative partners work. We need an AI that lives where we live—in the terminal, within our projects, and deeply integrated into our workflow.

This is precisely what the Gemini CLI sets out to achieve. It’s not just a fancy chatbot for your command line; it’s an experimental interface designed to bring the power of Gemini directly into your development loop, enabling intelligent, contextual, and actionable AI assistance.

It’s for this very reason that I’ve been quite heads-down over the last few months, working with a super talented team to bring this application to life. It has genuinely been one of my most fun experiences at Google in the 20+ years that I’ve been here, and I feel incredibly fortunate to have had the chance to collaborate with such brilliant people across the company.

The Power of Small Tools, Amplified by AI

In May, I explored the concept of small tools, big ideas. The premise was simple: complex problems are often best tackled by composing many small, powerful, and specialized tools. This philosophy is at the very heart of the Gemini CLI’s design.

Instead of a monolithic AI trying to do everything at once, the Gemini CLI empowers Gemini with a suite of familiar command-line tools. Imagine an AI that can:

  • Read and Write Files: Using read_file and write_file, it can inspect your codebase, understand existing logic, and propose modifications directly to your files.
  • Navigate Your Project: With list_directory and grep, it can explore your project structure, locate relevant files, or find specific patterns across your repository, just like you would.
  • Execute Shell Commands: The run_shell_command tool allows Gemini to execute commands, build your project, run tests, or even interact with external services, providing real-time feedback.
  • Search the Web: Need to look up an API, debug an error message, or find best practices? The google_web_search tool lets Gemini leverage the vastness of the internet to inform its responses and actions.
  • Edit with Precision: Beyond simple file writes, the edit_file tool allows for granular, diff-based modifications, ensuring changes are precise and reviewable.

This approach means Gemini isn’t guessing; it’s acting. It’s using the same building blocks you use every day, but with its powerful reasoning capabilities to orchestrate them towards your goals.

A Truly Contextual and Collaborative Partner

The Gemini CLI maintains a persistent session, remembering your conversation history, the files it has examined, and the results of previous tool executions. This “conversational memory” and contextual understanding are critical. It allows for a natural, iterative back-and-forth, where the AI builds on prior interactions and its understanding of your project state.

You can ask Gemini to:

  • “Find all JavaScript files in this directory that import React.” (Leveraging list_directory and grep)
  • “Refactor this component to use hooks.” (Involving read_file, edit_file, and potentially run_shell_command to run tests).
  • “What’s the best way to implement X in Python given these files?” (Using read_file to understand your existing code and google_web_search for best practices).

The workflow is truly interactive. Gemini proposes actions, and you have the power to approve them or guide it further. This human-in-the-loop design ensures you’re always in control, fostering a collaborative partnership rather than a black-box operation.

Built by Gemini CLI, For Everyone

It’s particularly exciting to share that this project was started by a small and scrappy team, and we leveraged Gemini CLI itself to help write Gemini CLI. Many of us now work almost exclusively within Gemini CLI, often using our IDEs only for viewing diffs.

And while its origins are in coding, Gemini CLI is incredibly versatile for many tasks outside of traditional development. Personally, I love using it to manage my home lab, to bulk rename and reformat files for my podcast project, and to generally act as a seamless go-between for anything complicated in GitHub. Increasingly, I’ve also been using Gemini CLI with Obsidian to understand and extract insights from my vault. With over 9000 files in my work vault alone, Gemini CLI lets me ask questions of the entire vault and even make large refactoring-style changes across the entire thing.

Beyond Today: Extensibility

One of the most exciting aspects of the Gemini CLI, and a direct nod to the “small tools, big ideas” philosophy, is its extensibility. The underlying architecture allows developers to define custom tools. This means you can teach Gemini to interact with your specific internal systems, proprietary APIs, or niche development tools. The possibilities are endless, transforming Gemini into an AI assistant perfectly tailored to your unique development environment.

Get Started Today

The Gemini CLI represents a significant leap forward in bringing intelligent AI assistance directly to where developers work most effectively: the command line. It’s a practical realization of the “true AI coding partner” vision, built on the principle that small, well-designed tools can achieve big ideas when orchestrated by a powerful intelligence.

Ready to try it out? Head over to the Gemini CLI GitHub repository to get started. Explore the commands, experiment with its capabilities, and let’s shape the future of AI-powered development together.

I’m incredibly excited about what this means for developer productivity and the evolving role of AI in our daily coding lives. Let me know what you build with it!

How Throwaway AI Experiments Lead to Better Code

Over the past few months, I’ve accidentally discovered a new rhythm when coding with AI—and it has reshaped my approach significantly. It wasn’t something I planned or found in a manual. Instead, it emerged naturally through my experiments as I kept noticing consistent patterns whenever I used AI models to build new features. What started as casual exploration has evolved into a trusted process: vibe, vibe again, and then build. Each step plays a distinct role, and together they’ve transformed how I move from a rough idea to functional software.

I first noticed this pattern while developing new features for Gemini Scribe, my Obsidian plugin. I was exploring ways to visualize the file context tree for an upcoming update. Out of curiosity, I gave Cursor an open brief—virtually no guidance from me at all. I simply wanted to see how the model would respond when left entirely to its own devices. I wasn’t disappointed. The model produced a surprisingly creative user interface and intriguing visualization approaches. The first visualization was a modal dialogue showing all files in the tree with a simple hierarchy. It wasn’t ready to ship, but I vividly remember feeling genuine excitement at the unexpected creativity the model demonstrated. The wiring was messy, and there were integration gaps, but it sparked ideas I wouldn’t have reached on my own.

Encouraged by this, I initiated a second round—this time with more structure. I took insights from the initial attempt and guided the model with clearer prompts and a deliberate breakdown of the problem. Again, the model delivered: this time, a new panel on the right-hand side that displayed the hierarchy and allowed users to click directly to any note included in the file context. This feature was genuinely intriguing, closely aligning with the functional design I envisioned. Between these two experiments, I gathered valuable insights on shaping the feature, making it more useful, and improving my future interactions with the model.

These experiences have crystalized into the three phases of my workflow:

  • Max vibe: Completely open-ended exploration to find creative possibilities.
  • Refined vibe: Targeted experimentation guided by learnings from the first round.
  • Build: Structured, focused development leveraging accumulated insights.

The first step—”max vibe and throw away”—is about unleashing the model’s creativity with maximum freedom. No constraints, no polish, just pure experimentation. It’s a discovery phase, surfacing both clever ideas and beautiful disasters. I spend roughly an hour here, take notes, then discard the output entirely. This early output is for exploration, not production.

Next comes “vibe with more detail and throw away again.” Equipped with insights from the initial exploration, I return to the model with a detailed plan, breaking the project into smaller, clearer steps. It’s still exploratory but more refined. This output remains disposable, maintaining fluidity in my thinking and preventing premature attachment to early drafts.

Only after these two rounds do I transition into production mode. At this point, experimentation gives way to deliberate building. Using my notes, I craft precise prompts and break the project into clear, manageable tasks. By now, the route forward is clear and defined. The resulting code, refined and polished, makes it to production, enriched by earlier explorations.

Interestingly, the realization of this workflow was born out of initial frustration. The first time I tried solely prompting for a specific feature set, my codebase became so tangled and problematic that I gave up on fixing it and threw it away entirely. That sense of frustration was pivotal—it highlighted how valuable it is to assume the first two tries would be disposable experiments rather than final products.

Stepping back, this workflow feels more like a structured series of experiments rather than a partnership or conversation. While I appreciate the creative input from AI models, I don’t yet see this approach as the true AI coding partner I envision. Instead, these models currently serve as tools that help me explore possibilities, challenge assumptions, and provide fresh perspectives. I discard many branches along the way, but the journey itself remains immensely valuable.

AI isn’t just writing code—it’s changing how I approach problems and explore solutions. The journey has become as valuable as the destination.

If you’ve been experimenting with AI in your projects, I’d love to hear about your rhythm and discoveries. Have you found your own version of “vibe and build”? Drop me a note—I’d love to learn how others navigate this fascinating new landscape.

Waiting for the True AI Coding Partner

Lately, I’ve been reflecting on a concept that’s been gaining traction in the developer community: “vibe coding.” The term gained prominence after computer scientist Andrej Karpathy discussed it in October 2024, referring to a programming approach leveraging large language models to generate code from natural language descriptions. Karpathy described this method as a conversational interaction with AI, where he would “just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works.” (Original tweet)

Over the past six months, I’ve immersed myself in tools like Gemini 2.5 ProCursorGitHub Copilot, and various chat interfaces to different models. These tools have significantly enhanced my productivity—not by producing flawless code outright, but by fundamentally reshaping my development process.

My workflow has undergone a notable transformation. While these AI tools excel in exploration and ideation, I seldom deploy code exactly as generated. Instead, I engage in a dynamic, iterative process—rapidly prototyping multiple approaches, assessing their viability, and identifying potential pitfalls. This method resembles sketching preliminary drafts before committing to a final piece, fostering intuition and insight into the problem space.

This approach is liberating. Since initial iterations are AI-generated, discarding them carries no remorse. This contrasts sharply with my previous experiences, where manually written code often led to a sunk-cost mentality, making it harder to abandon flawed directions. Now, if a prototype doesn’t meet expectations, I simply move on, unencumbered.

The time savings are substantial. Achieving a functional prototype now takes mere hours, whereas previously, it might have required days. This efficiency opens avenues for broader exploration, allowing me to entertain ideas that once seemed too risky or time-intensive. I can experiment with numerous approaches before lunch and dedicate the afternoon to refining the most promising one.

However, this methodology leans heavily on experience. Decades of coding, familiarity with design patterns, system architecture, and honed intuition guide me in effectively interacting with the AI—knowing what to request, interpreting outputs, and steering the development process. Without this foundation, the experience might feel less like a collaborative dance and more like navigating without a map. This experience isn’t just about knowing syntax; it’s about the architectural intuition to guide the AI, the pattern recognition to spot plausible-but-wrong suggestions, and the judgment to know when a generated snippet is truly production-ready versus merely a starting point.

The landscape of AI-assisted development continues to evolve with the emergence of tools like Bolt.new and Gemini Canvas. These platforms take vibe coding a step further by enabling the creation of full web applications through natural language prompts. With Bolt.new, users describe their envisioned application, and the AI generates the corresponding code, providing a workspace for further customization. Gemini Canvas offers an interactive environment where users can draft, edit, and preview web applications in real-time, toggling seamlessly between running applications and underlying code.

While these advancements are impressive, they also evoke a sense of caution. Relying heavily on AI for code generation raises questions about the quality, security, and maintainability of resulting applications. It’s crucial to balance the convenience of AI assistance with thorough review and understanding of produced code, ensuring adherence to best practices and alignment with project requirements.

In essence, tools like Bolt.new and Gemini Canvas are transforming the development landscape, making it more accessible and efficient. However, they also underscore the importance of maintaining a critical eye and an active role in the development process to deliver robust and reliable applications. Even so, I believe this is just the beginning. As these tools improve, the line between prototyping and production will blur further. For now, I’m embracing this rhythm—coding by feel, following the vibes, and letting curiosity lead the way.

Yet, I don’t think we’ve seen the end of this trend. If anything, we’re just getting started. Frontier labs are intensely focused on enhancing coding and, crucially, machine learning engineering capabilities within their models. OpenAI explicitly discussed this when introducing MLE-Bench last October, building and evaluating AI tools specifically to assist with complex, domain-specific tasks performed by their own machine learning engineers. Improving AI’s ability to contribute to ML workflows accelerates internal research and development cycles, creating a powerful feedback loop.

But for all the progress in the models themselves, the interface still lags behind. Tools like Cursor, Copilot, or AI-assisted editors in VS Code remain designed around source code as the primary focus. Yet when deeply immersed, the code increasingly feels secondary. The conversation is paramount. Until interfaces evolve to truly center conversation, we might only scratch the surface of this collaborative paradigm.

The tools need to catch up with this shift. The interface should support conversation as the primary modality. Karpathy’s original idea resonates here—I don’t want a passive assistant or a fancy text editor. What I want is closer to sitting beside a truly skilled engineer, working through problems together.

It reminds me of pair programming practices common in Google’s early days. We’d sit side by side at one computer, each with our own keyboard and mouse, either person able to take over at any moment. There’s a great description of this dynamic in The New Yorker’s piece on the friendship that made Google huge. That spirit of collaborative problem-solving—real-time, fluid, and deeply interactive—is exactly what I desire from my AI partner.

That’s my goal—not an editor, not a command-line interface masquerading as a co-pilot, but a true thought partner. Someone—or rather, something—up to date on the latest trends, fluent in new languages, well-versed in modern idioms and design patterns. A partner who sees what I see, tracks what I’m pointing at, and understands half-formed questions I ask aloud. We’re not there yet. But I can feel it getting closer, and I’m genuinely excited about where this path leads.

But that’s just my perspective after diving into this world. What about yours? I’d love to hear your thoughts in the comments. What aspects of ‘vibe coding’ resonate with you, and what challenges have you faced? What’s working well in your AI-assisted workflow, and what improvements or future directions are you most excited about?

Gemini Scribe Update: Let’s Talk About How Your Chat History is Now Supercharged!

Hey everyone! If you’re using the Gemini Scribe plugin for Obsidian, you’re already experiencing the power of having Google’s Gemini AI right inside your notes. It’s a fantastic way to boost your note-taking and content creation. And guess what? I’ve just rolled out a major update that makes things even better!

Major Changes: A New Way to Store Your Chat History

The biggest change in this update is how Gemini Scribe handles your chat history. I’ve moved away from storing it in a database and switched to using Markdown files instead. This means your chat history now lives right alongside your notes, making your data more portable and easier to back up. I’m also introducing a new system where each note’s chat history is stored in its own separate file within the gemini-scribe folder, which will keep your Obsidian vault nice and tidy. These history files are automatically linked to the notes they came from, providing better context and making navigating your information a breeze.

Cool New Features and Improvements:

Don’t worry about a thing! The plugin will automatically move your existing chat history from the old database to the new Markdown files. This happens behind the scenes, requires no effort from you, and ensures that none of your existing chat history is lost in the process. I’ve also added a couple of new commands to give you more control: If you ever need to, you can manually trigger the migration of your chat history with the “Migrate Database History to Markdown” command. Once you’ve confirmed that everything has been migrated successfully, you can use the “Clear Old Database History” command to safely remove the old database.

I’ve also made some technical improvements to make everything run more smoothly, including improved history file management, automatic cleanup of orphaned history files, more robust history file naming, and better error handling and recovery. I want this update to be as seamless as possible for you. You’ll get clear notifications about the migration status, so you’ll always know what’s going on. If, for some reason, the automatic migration doesn’t work, you have the option to do it manually. And you can verify that everything has been migrated correctly before you clear out the old database.

I’ve also taken care to ensure backward compatibility: Your old database data will stick around until you tell me to remove it. If the automatic migration doesn’t work, you can always use the manual migration option. And rest assured, all your existing chat history will be preserved during this transition. Finally, I’ve also taken care of a few pesky bugs: I’ve fixed issues with how history files are handled when you rename notes, improved error handling for history operations, and made some tweaks to better handle those rare edge cases during history migration.

A Few Important Notes:

  • This update changes how your chat history is stored.
  • The migration process is automatic and safe.
  • Please verify that your history has been properly migrated before clearing the old database.
  • The old database will be preserved until you explicitly clear it using the new command.
  • This version also includes access to the new Gemini 2.5 Pro model!

I’m excited about these changes, and I encourage you to update to the latest version of Gemini Scribe to experience the improvements firsthand. As always, I value your feedback and suggestions as I continue to make the plugin even better. Let me know what you think!

Introducing Gemini Scribe: Your AI Writing Assistant for Obsidian

What if you could collaborate with an AI writing partner directly within Obsidian? Imagine brainstorming ideas, refining outlines, and polishing your prose with the help of a powerful AI assistant. Meet Gemini Scribe, a new Obsidian plugin designed to bring the power of AI collaboration to your note-taking workflow. Inspired by my own experience using AI to transform my writing process, Gemini Scribe is now available in the Obsidian community plugin library. I’m excited to share this tool that makes AI collaboration an integral part of your writing journey in Obsidian.

Why Gemini Scribe?

I’ve been using AI models to write everything from presentations and blog posts to technical documentation for awhile now. But a pivotal moment came when I was recently asked to give an impromptu talk about my writing process. I decided to walk the audience through how I collaborated with an AI model to develop the very presentation I had given earlier that day. Starting with a simple idea, I showed them how the model helped me brainstorm, refine, and polish my outline, ultimately shaping the talk they had seen. The audience’s enthusiastic response was a little surprising. They saw the power of AI collaboration firsthand, and many asked how they could integrate it into their own writing workflows. That’s the driving force behind Gemini Scribe. It’s designed to bring that same seamless AI collaboration directly into Obsidian, empowering you to supercharge your writing process just like I do.

Key Features of Gemini Scribe

Context-Aware AI Chat
Engage in dynamic conversations with Gemini directly within Obsidian. Gemini Scribe analyzes your current note’s content, providing contextually relevant responses and insights. This means you can ask questions, brainstorm ideas, and get writing assistance tailored specifically to the topic at hand. Imagine exploring different arguments for an essay or generating creative variations of a paragraph, all without leaving your Obsidian workflow.

Summarization Made Easy
Quickly generate concise summaries for any note in your vault. Gemini Scribe distills the core ideas of your notes into single-sentence summaries, automatically storing them in the document’s frontmatter. This makes it easier to review your notes at a glance, organize your ideas, and quickly recall key information. Imagine instantly summarizing meeting notes, research articles, or even long-form blog posts, all with a single click.

Collaborative Writing Assistance
Collaborate with Gemini to enhance every stage of your writing process. Brainstorm new ideas, refine existing drafts, and even generate entire sections of text with AI-powered assistance. As you interact with Gemini, your draft can be updated automatically, streamlining your workflow and allowing you to see the changes in real-time. Whether you’re outlining a new blog post, expanding on a research paper, or polishing a creative piece, Gemini Scribe can help you overcome writer’s block, explore different perspectives, and produce high-quality content more efficiently. Imagine effortlessly expanding on bullet points, transforming outlines into prose, or receiving targeted suggestions for improving your writing style, all within your active document.

How to Get Started

  1. Install Gemini Scribe:
    • In Obsidian, go to Settings → Community Plugins and search for “Gemini Scribe.”
    • Click Install and then Enable.
  2. Set Up the Plugin:
    • Obtain your Gemini API key from Google AI Studio.
    • In Obsidian, open the Gemini Scribe plugin settings.
    • Enter your API key and configure any optional settings, such as your preferred model or summarization preferences.
  3. Start Using Gemini Scribe:
    • Open the chat interface via the command palette (Cmd/Ctrl + P) and type “Gemini Scribe: Open Chat.” This allows you to interact with the AI assistant, using your current note as context.
    • Summarize notes quickly with the “Gemini Scribe: Summarize Active File” command, also accessible via the command palette.
    • Start collaborating with Gemini to brainstorm ideas, refine drafts, and enhance your writing within your active document.

What’s Next for Gemini Scribe?

Gemini Scribe is just getting started. I’m already planning exciting new features, including:

  • Audio Note Transcription: Import audio recordings and have them automatically transcribed into your Obsidian notes. This will make it easier than ever to capture ideas on the go and integrate them into your workflow.
  • Advanced Search and Recall: Quickly find relevant information across your entire vault using AI-powered search. Imagine being able to instantly surface notes related to a specific topic, even if they don’t use the exact same keywords.
  • Intelligent Predictive Text: Get context-aware text suggestions as you type, similar to a modern IDE. This feature will help you write faster and more efficiently by predicting your next sentence based on the content of your current note and your overall writing style. This feature will go beyond the current collaborative writing assistance to provide intelligent suggestions even when you’re not actively interacting with the AI assistant.

Your feedback is invaluable in shaping the future of Gemini Scribe. Please share your thoughts on these features and any other ideas you have by joining the discussion on GitHub. I’m excited to see how Gemini Scribe evolves with your input!

Try Gemini Scribe Today

If you’re an Obsidian user looking to elevate your note-taking and writing workflow, give Gemini Scribe a try. You can find it in the Obsidian community plugin library.

Thank you for your support—I’m excited to see how Gemini Scribe transforms your creative process!