A conceptual illustration of an AI agent's anatomy. A central glowing orb represents the core agent, surrounded by three icons: an eye for Perception, a brain for Reasoning, and a hand for Action. Thin lines connect the icons to the center, symbolizing an interconnected system.

The Anatomy of an AI Agent

Welcome back to The Agentic Shift. This series is my attempt to map the new territory of agentic AI as it unfolds—a shift as fundamental as the move from desktop to mobile. We’re on a journey to understand how AI is evolving from a passive tool that creates to an active partner that does. Together, we’ll dissect the anatomy of an agent, explore how it thinks and remembers, examine the tools it uses to act, and grapple with the challenges of guiding it safely.

In our first post, we introduced this new age of agents. Now, it’s time to get our hands dirty and look under the hood.

From Maps to Navigators

I love maps. I always have. As a kid, I’d spread them out on the floor, tracing roads with my finger, just to understand the shape of a place. I love the ritual of folding them just right. For years, I kept a stack of them in my car. I even had the incredible fortune to work on Google Maps for nearly a decade.

Given all that, you’d think my sense of direction would be impeccable. Well, it isn’t. I could get lost in a paper bag with one opening. For me, a map is a beautiful tool for understanding, but a terrible one for navigating. It gives you all the data, but you have to do the hard work of figuring out where you are, where you’re going, and what to do when you inevitably take a wrong turn.

A GPS navigator, on the other hand, is a different beast entirely. It’s an active partner. You give it a goal—”Get me to the airport”—and it takes on the cognitive load. It doesn’t use AI in the way we’re going to be talking about it in this series, but it has the key characteristics of an agentic system. It senses the current state of the world through traffic data. It thinks about the most efficient path. And it uses its tools to act, giving you turn-by-turn directions. If it senses a problem, it proactively finds another way.

That leap—from a static tool to an active, goal-oriented partner—is the very essence of the “agentic shift.” And just like a GPS, an AI agent is defined by its fundamental anatomy: how it perceives its world, how it thinks, and how it acts.

Defining the Agent: More Than a Smart Tool

Before we go any further, let’s address the elephant in the room. The term “AI agent” is, as technologist Simon Willison has noted, “infuriatingly vague.” Different people use it to mean different things. For some, it’s an “LLM autonomously using tools in a loop.” For others, it’s a system that can “plan an approach and then run tools… until a goal is achieved.”

For our purposes in this series, we’ll establish a simple, core principle: an agent isn’t just a model; it’s a system built around a model. It’s a complete entity with distinct parts that work together. To understand it, we need to look at its three anatomical pillars:

  • Perception: The Senses
  • Reasoning/Cognition: The Brain
  • Action: The Hands

But why is this happening now? After all, we’ve had automation and bots for years. The difference lies in a powerful technological convergence. First, the “brain” got a massive upgrade; recent large models are capable of genuine reasoning and planning. Second, the digital world has become almost universally accessible via APIs, giving the agent’s “senses” and “hands” a world of information to perceive and a universe of tools to act upon. This combination is what makes the current moment so transformative.

The Anatomy, Piece by Piece

Let’s break down what each of these parts actually does.

Perception (The Senses)

First, how does an agent understand its environment? When we talk about an agent’s senses, we’re not talking about cameras or microphones. An AI agent’s environment is digital. Its perception comes from its ability to access information through APIs, data streams, and file systems. It might “see” the latest financial data by calling a stock market API, or “read” a user’s notes by accessing a local file. This is its window into the digital world.

Reasoning/Cognition (The Brain)

At the heart of every agent is its brain: a large model. This is the component that takes the information from its senses, considers the overall goal, and creates a plan. The model is the decision-maker. In Part 2 of this series, we’ll dive deep into how it thinks using different cognitive patterns, and in Part 3, we’ll explore the critical role of memory. For now, just know this is the part that makes the choices.

Action (The Hands)

An agent that can perceive and think is still just an observer. To be an agent, it must be able to do things. The agent’s “hands” are the tools it has been given. These tools are almost always APIs that allow it to perform actions: writing to a file, sending an email, searching the web, or running a piece of code. This is where the agent moves from thinking to acting. This creates a dynamic feedback loop: it acts, perceives the results of that action, and then reasons about what to do next. This cycle is the engine of an agent. This concept is so central that we’ll dedicate Part 4 entirely to the agent’s ‘toolkit’ and Part 5 to the art of writing the instructions that guide its actions.

The “Agentic” Spark: What Makes It Different?

These three parts—perception, reasoning, and action—are the building blocks. But what truly makes a system agentic are the emergent properties that come from combining them:

  • Autonomy: It can operate without constant, step-by-step human intervention. This doesn’t make the human irrelevant; it changes the nature of our collaboration from micromanagement to high-level direction. It doesn’t need to be told how to do something, just what the goal is.
  • Goal-Orientation: It’s driven by a high-level objective, not just a single command. The goal isn’t “search for flights”; it’s “plan my business trip to Singapore.”
  • Proactivity: It can take initiative. Like the GPS that reroutes you around traffic, an agent can adapt its plan when it perceives changes in its environment.

This combination of autonomy and proactivity is incredibly powerful, but it also introduces new challenges we have to solve. In Part 6, we’ll discuss how to build in the necessary guardrails to ensure agents act safely and securely.

A Simple Agent in Action: The Weather Forecaster

Let’s tie this all together with a simple example. Imagine an agent whose goal is to answer the question: “Will I need an umbrella tomorrow?”

  1. Goal: The agent is given its objective.
  2. Perception: It uses its senses—a weather API—to get the forecast for your location.
  3. Reasoning: Its brain processes the data it perceived: “80% chance of precipitation.” It connects this data to the goal and concludes that rain is likely and an umbrella would be useful.
  4. Action: It uses its hands—a notification API—to send a message to your phone: “Looks like rain tomorrow, don’t forget your umbrella!”
  5. Loop: The action is complete. The agent now waits, ready to perceive new information or receive a new goal.

This simple loop is the foundation of every agent, from this basic forecaster to the most complex systems being built today.

Conclusion: The Foundation is Set

So, what is an AI agent? At its core, it’s a system with a reasoning brain (a large model) connected to a digital environment through a dynamic loop of perception and action.

Understanding this anatomy isn’t just an academic exercise. For anyone looking to build, manage, or work alongside these new systems, this is the essential first step. It gives us a shared language and a mental model for everything that follows.

Now that we’ve assembled the basic anatomy of an agent, the rest of this series will be about bringing it to life. In Part 2, we’ll explore the fascinating ways an agent thinks, and from there, we’ll cover everything from memory and tools to safety and even how multiple agents can collaborate to solve complex problems. The foundation is set, and the exciting part is just beginning.

Abstract digital art of a glowing, multifaceted geometric shape at the center of a sparse network diagram on a dark background.

The Agentic Shift: Welcome to the Age of Agents

Throughout my career, I’ve had the privilege of witnessing a few of those rare, ground-shifting moments in technology. I saw the rise of the internet transform from a niche academic network into a global utility with high-speed access for all. I watched the personal computer evolve from an expensive hobbyist’s toy into a commodity that billions of people rely on every day, a shift that was fundamentally enabled by the advent of cloud computing. The cloud moved the heavy lifting of computing off our desktops and into vast, remote data centers, completely changing how we build and deliver software. Then came the mobile revolution, shrinking the PC into our pockets and connecting us to a constant stream of information. Hand-in-hand with this was the rise of social media, which turned the internet into a dynamic, two-way medium for human connection and communication. Today, we are standing on the cusp of another such fundamental shift, driven by artificial intelligence and the new design patterns emerging alongside it.

For the past couple of years, we’ve been captivated by what generative AI can create. We prompt, and it writes, draws, or codes. It’s a powerful, but ultimately passive, partnership. We give the command; it generates the response. But the wave that’s arriving now is different. It’s defined by what AI can do.

We are moving from passive assistance to active, autonomous execution. An AI agent isn’t just a sophisticated tool waiting for a command; it’s a partner given a mission. It can independently plan, use tools, and adapt its strategy to achieve a goal. As Bill Gates put it, “Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software industry.” It’s a fundamental re-architecting of our relationship with machines.

This series, “The Agentic Shift,” is my attempt to map this new territory as it unfolds. It’s for the builders, the thinkers, the product managers, and the business leaders who are curious about where this is all heading. It’s for anyone who senses this shift and wants to understand it from the ground up.

Together, we’ll go on a journey. We’ll start by dissecting the basic anatomy of an agent—what makes it tick? From there, we’ll explore how agents think, remember, and use their digital “hands” through tools and APIs. We’ll cover the practical art of guiding their behavior, putting up essential guardrails, and choosing the right frameworks to build on.

Finally, we’ll look at how agents collaborate with each other and what it takes to move them from a prototype to a production system, all while grappling with the critical questions of ethics and responsibility.

It’s an ambitious road ahead, but a necessary one to travel. The age of agents is here. Let’s explore it together.

Series Table of Contents

A visual representation of programming's evolution, starting from a physical circuit board at the base, flowing upwards into streams of code, and culminating in an abstract, glowing neural network at the top.

The Last Programmers

A recent article by Xipu Li, “The Last Programmers,” caught my eye on Hacker News this morning, and it puts a sharp point on a feeling that’s been quietly circling in the back of my head for a while. Li argues that we are the last generation of people who will manually translate ideas into code. It’s a bold claim, but one paragraph, in particular, resonated with me.

He observes his younger colleagues, who seem to code by conversing with AI, and notes how the old guard might see them as “lazy” or “soft.” But he pushes back with an insight that feels like a fundamental law of our craft:

“But here’s what I’ve realized watching them: they’re not lazy. They’re just following the natural path that technology has always followed. Every major advancement in programming has been about abstracting away complexity so humans can focus on higher-level problems. We moved from machine code to assembly to high-level languages to frameworks to libraries. Each step made things ‘easier’ and each step had people complaining that developers were getting soft.”

This is it, right here. The entire history of software development is a story of climbing a ladder of abstraction. Each new rung—from the raw bits of machine code to the expressive power of modern frameworks—was built to hide complexity, freeing us up to think about bigger, more human problems.

And at every step of that climb, there were gatekeepers who insisted that “real” programming was being lost. The assembly wizards scoffed at C, believing true mastery meant wrestling with the machine on its own terms. The C veterans, in turn, often viewed those using Python or JavaScript with suspicion, as if the convenience of garbage collection was a crutch for those who couldn’t handle the raw power of manual memory management.

Today, we’re seeing the same story play out with the rise of AI-assisted development, or “vibe coding.” The idea of talking to an AI to build software feels like the next logical, almost inevitable, rung on that ladder. It allows us to stand even further from the implementation details and closer to the core of what we’re trying to solve. The focus shifts from how the code is written to what the code must achieve.

This is the very territory I’m exploring in my upcoming series on AI agents. When we treat AI not just as a code generator but as a true partner in the development process, the level of abstraction shifts dramatically. Our role begins to morph from meticulous coder to systems thinker and architect—someone who guides intelligent agents to build, test, and deploy. We are no longer just writing instructions; we are defining outcomes.

This transition won’t be seamless. It demands new skills: the art of the prompt, the discipline of high-level system design, and a deep, intuitive feel for collaborating with non-human partners. But this isn’t a sign of the craft weakening. It’s a sign of it maturing.

By embracing this next layer of abstraction, we aren’t getting softer. We’re just getting started on the real work.

An abstract digital illustration representing the concept of context, with a central glowing orb surrounded by interconnected geometric shapes and lines, symbolizing the relationships that create meaning.

The Unseen Thread of Context

A friend of mine, Chris Perry, wrote a thought-provoking piece recently on how context is the crucial missing ingredient holding AI back. It’s an idea that has been echoing in my own work, and his post and our recent discussions on this have really crystallized it perfectly. In our rush toward more powerful models and more capable agents, we sometimes forget that true intelligence isn’t just about processing power; it’s about memory, continuity, and the unseen thread of context that connects one moment to the next.

This hits home for me. In my own experiments with AI, whether it’s building a podcast archive or an AI coding partner, some of the most significant breakthroughs have come not from raw model capability, but from finding better ways to ground the AI in a specific reality. An AI can write great code, but it’s useless if it doesn’t remember the architectural decisions we made ten minutes ago. It can answer a question, but it can’t be a true partner if it starts every conversation from scratch.

Chris is right to point out that the next frontier is enabling AI to carry context across tasks and sessions. This is the difference between a clever tool and a genuine collaborator. A tool performs a single function, cleanly and efficiently. A collaborator remembers our history, understands our goals, and anticipates our needs. It knows not just what we’re asking, but why we’re asking it. That persistence of memory is what transforms a series of isolated interactions into a meaningful, evolving dialogue.

In a world of increasingly complex, agentic systems, this thread of context is everything. It’s what will allow our AI partners to move from being simple problem-solvers to something far more valuable: partners in thought.

An antique-style fantasy map titled "The Journey of Innovation." It shows a winding, dashed red line charting a complex path through conceptual territories like "The Mountains of Code," "The Sea of Management," and "The Startup Archipelago." The path ends very near its starting point, illustrating a full-circle journey.

Full Circle

My calendar looks different these days. The back-to-back blocks of 1:1s, strategy reviews, and planning sessions have given way to long, uninterrupted stretches of quiet. That quiet has been the most significant change—it’s brought back time to think, a noticeable drop in stress, and a genuine enjoyment in my work that I hadn’t realized was fading. It’s why, after years of leading teams, I’ve deliberately moved back to a role as an individual contributor.

This shift has changed my day-to-day work, but one thing that remains constant is the time I spend mentoring colleagues and contacts, helping them navigate their own career questions. In those conversations, my own journey often comes up, and I hear a familiar question: “You were leading large teams… why the change?” Some have even wondered if I was leaving the company (I’m not). It’s a question with more than one answer, and I realized this post is my way of exploring them fully—for everyone who has asked, and for anyone else thinking about their own path.

It’s a fair question, and the simple answer is that my career has always been guided by a desire to learn and experience things more deeply. It’s never been a straight line up the leadership ladder; I’ve moved between managing and building several times. Each shift was a deliberate choice to go where I felt I could learn the most. This recent move—from a Senior Director role in Cloud AI to a Distinguished Engineer in Google DeepMind—is just the latest example of that pattern: a deliberate step toward the work that feels most urgent and exciting right now.

That motivation started early. My move from Indiana University to Cisco wasn’t just for a job; it was to understand what Silicon Valley was really about. When the dot-com bubble burst, I saw it as a chance to experience something new and jumped into the startup world, working on the foundational tech for what would become the 802.11n and 802.11s WiFi standards. I was learning a ton, but I knew my growth had plateaued. That’s when a friend asked me to consider Google. It was October 2004, just after the IPO, and Google seemed like a magical place. I said yes without knowing what team I’d join. I just wanted to see what it was all about.

My Google journey began in March of 2005 on the municipal WiFi project in Mountain View, but soon took me to London as one of our first engineers in that office. After building out the test engineering team, I moved into Ads and had my first real chance to work with machine learning at Google, working on systems for multivariate ad optimization. From there, I moved back to the US and eventually found my way to Google Maps and Street View.

That was a dream job. I spent nearly a decade in Geo, starting on a team of two working on the launch pipeline and serving infrastructure. Over time, my responsibilities grew, and I had the privilege of leading teams working on everything from the “time machine” feature for historic imagery to 3D reconstruction, imaging hardware, machine learning, and augmented reality. Through it all, I had the chance to learn, explore, and contribute alongside people who became some of my dearest friends.

In 2019, a different kind of challenge appeared. My manager was asked to build a new product area, and I offered to help as his Chief of Staff. I wanted to learn how Google was managed as a business—how decisions were made and how organizations were designed at a macro scale. After two years in that role, I moved back into a technology leadership role, helping with the formation of Core ML.

It was after all of this that I started to realize something important: I missed having my own technical contributions. I missed the flow state, that feeling of time dissolving as you wrestle with a complex problem. I missed the direct feedback loop of writing a piece of code, running it, and seeing it work. I wanted to build my own ideas again.

That feeling connected directly back to my college days. I was an AI major at Indiana University in the 90s, and throughout my career, I had kept coming back to machine learning—in Ads, in Geo, in Core ML. With the explosion of generative AI in 2022, I knew exactly where I wanted to spend my time. More than anything, I wanted to apply these powerful new models to solve real-world problems.

This led me to the ML Developer team in Cloud, leading the Kaggle, Colab, and Gemini API teams. It was a smaller team with a mature leadership bench, which gave me more time to build my own projects—many of which have been chronicled on this blog. As the team evolved, I began contributing to internal projects as well, which culminated in the launch of Gemini CLI, where I was one of the core contributors from the beginning.

Working on Gemini CLI, I realized I was finally doing the exact kind of work I had been craving. When an opportunity came up to move to Google DeepMind and focus full-time on AI Agents and the future of Gemini CLI, I knew it was the right next step.

People often ask me why I’ve been at Google for over 20 years. The answer is simple: it has always been a place of discovery. It’s had its ups and downs, of course. There have been times I’ve considered leaving and times I’ve disliked my situation. But I’ve been lucky enough to move around and keep things fresh, working on projects in mobile, search, maps, technical infrastructure, cloud, and AI. Where else can you get exposed to so much in one place? The fantastic Acquired podcast is currently doing a series on Google (1, 2), and hearing those stories reminded me of how fortunate I’ve been to occasionally get a preview of the future. While a journey like this requires hard work, it also requires being in the right place at the right time. Right now, I feel like I’m in the perfect place for whatever comes next.

This move isn’t just about returning to code. It’s about being in the driver’s seat for the next evolution of software development, where our primary collaboration is with intelligent agents. For a builder, there’s no more exciting place to be. I’m home.

A cheerful, cartoon-style purple bear with a large head and big eyes is sitting at a desk, happily using a computer with a text editor open on the screen. A section of the text is highlighted.

A More Precise Way to Rewrite in Gemini Scribe

I’ve been remiss in posting updates, but I wanted to take a moment to highlight a significant enhancement to Gemini Scribe that streamlines the writing and editing process: the selection-based rewrite feature. This powerful tool replaced the previous full-file rewrite functionality, offering a more precise, intuitive, and safer way to collaborate with AI on your documents.

What’s New?

Instead of rewriting an entire file, you can now select any portion of your text and have the AI rewrite just that part based on your instructions. Whether you need to make a paragraph more concise, fix grammar in a sentence, or change the tone of a section, this new feature gives you surgical precision.

How It Works

Using the new feature is simple:

  1. Select the text you want to rewrite in your editor.
  2. Right-click on the selection and choose “Rewrite with Gemini” from the context menu, or trigger the command from the command palette.
  3. A dialog will appear showing you the selected text and asking for your instructions.
  4. Type in what you want to change (e.g., “make this more formal,” “simplify this concept,” or “fix spelling and grammar”), and the AI will get to work.
  5. The selected text is then replaced with the AI-generated version, while the rest of your document remains untouched.

Behind the scenes, the plugin sends the full content of your note to the AI for context, with special markers indicating the selected portion. This allows the AI to maintain the style, tone, and flow of your document, ensuring the rewritten text fits in seamlessly.

Why This is Better

The previous rewrite feature was an all-or-nothing affair, which could sometimes lead to unexpected changes or loss of content. This new selection-based approach is a major improvement for several reasons:

  • Precision and Control: You have complete control over what gets rewritten, down to a single word.
  • Safety: There’s no risk of accidentally overwriting parts of your document you wanted to keep.
  • Iterative Workflow: It encourages a more iterative and collaborative workflow. You can refine your document section by section, making small, incremental improvements.
  • Speed and Efficiency: It’s much faster to rewrite a small selection than an entire document, making the process more interactive and fluid.

This new feature is designed to feel like a natural extension of the editing process, making AI-assisted writing more of a partnership.

A Note on the ‘Rewrite’ Checkbox

I’ve received some feedback about the removal of the “rewrite” checkbox from the normal mode. I want to thank you for that feedback and address it directly. There are a couple of key reasons why I decided to remove this feature in favor of the new selection-based rewriting.

First, I found it difficult to get predictable results with the old mechanism. The model would sometimes overwrite the entire file unexpectedly, which made the feature unreliable and risky to use. I personally rarely used it for this reason.

Second, the new Agent Mode provides a much more reliable way to replicate the old functionality. If you want to rewrite an entire file, you can simply add the file to your Agent session and describe the changes you want the AI to make. The Agent will then edit the entire file for you, giving you a more controlled and predictable outcome.

While I understand that change can be disruptive, I’m confident that the new selection-based rewriting and the Agent Mode offer a superior and safer experience. I’m always looking for ways to improve the plugin, so please continue to share your thoughts and feedback on how you’re using the new features.

The Future is Agent-ic

Ultimately, over the next several iterations of Gemini Scribe, I’ll be moving more and more functionality to the Agent Mode and merging the experience from the existing Gemini Chat Mode into the Agent. I’m hoping that this addresses a lot of feedback I’ve received over the last nine months for this plugin and creates something that is even more powerful for interacting with your notes. More on Agent Mode in a coming post.

I’m really excited about this new direction for Gemini Scribe, and I believe it will make the plugin an even more powerful tool for writers and note-takers. Please give it a try and let me know what you think!

Gemini Scribe Supercharged: A Faster, More Powerful Workflow Awaits

It’s been a little while since I last wrote about Gemini Scribe, and that’s because I’ve been deep in the guts of the plugin, tearing things apart and putting them back together in ways that make the whole experience faster, smoother, and just plain better.

One of the first things that pushed me back into the code was the rhythm of the interaction itself. Every time I typed a prompt and hit enter, I found myself waiting—watching the spinner, watching the time pass, watching the thought in my head cool off while the AI gathered its response. It didn’t feel like a conversation. It felt like submitting a form.

That’s fixed now. As of version 2.2.0, Gemini Scribe streams responses in real-time. You see the words as they’re generated, line by line, without the long pause in between. It makes a difference. The back-and-forth becomes more fluid, more natural. It pulls you into the interaction rather than holding you at arm’s length. And once I started using it this way, I couldn’t go back.

But speed was only part of it. I also wanted more control. I’ve been using custom prompts more and more in my own workflow—not just as one-off instructions, but as reusable templates for different kinds of writing tasks. And the old prompt system, while functional, wasn’t built for that kind of use.

So I rewrote it.

Version 3.0.0 introduces a completely revamped custom prompt system. You can now create and manage your prompts right from the Command Palette. That means no more hunting through settings or copying from other notes—just hit the hotkey, type what you need, and move on. Prompts are now tracked in your chat history too, so you can always see exactly what triggered a particular response. It’s a small thing, but it brings a kind of transparency to the process that I’ve found surprisingly useful.

All of this is sitting on top of a much sturdier foundation than before. A lot of the internal work in these recent releases has been about making Gemini Scribe more stable and more integrated with the rest of the Obsidian ecosystem. Instead of relying on low-level file operations, the plugin now uses the official Obsidian APIs for everything. That shift makes it more compatible with other plugins and more resilient overall. The migration from the old system happens automatically in the background—you shouldn’t even notice it, except in the way things just work better.

There’s also a new “Advanced Settings” panel for those who like to tinker. In version 3.1.0, I added dynamic model introspection, which means Gemini Scribe now knows what the model it’s talking to is actually capable of. If you’re using a Gemini model that supports temperature or top-p adjustments, the plugin will surface those controls and tune their ranges appropriately. Defaults are shown, sliders are adjusted per-model, and you get more precise control without the guesswork.

None of these changes happened overnight. They came out of weeks of using the plugin, noticing friction, and wondering how to make things feel lighter. I’ve also spent a fair bit of time fixing bugs, adding retry logic for occasional API hiccups, and sanding off the rough edges that show up only after hours of use. This version is faster, smarter, and more comfortable to live in.

There’s still more to come. Now that the architecture is solid and the foundation is in place, I’m starting to explore ways to make Gemini Scribe even more integrated with your notes—tighter context handling, more intelligent follow-ups, and better tools for shaping long-form writing. But that’s a story for another day.

For now, if you’ve been using Gemini Scribe, update to the latest version from the community plugins tab and try out the new features. And if you’ve got ideas, feedback, or just want to follow along as things evolve, come join the conversation on GitHub. I’d love to hear what you think.

Unlocking the Future of Coding: Introducing the Gemini CLI

Back in April, I wrote about waiting for the true AI coding partner. I articulated a vision for an AI that transcends mere code generation, one that truly understands context, acts autonomously within our development environments, and collaborates with us iteratively. Today, I’m thrilled to announce a significant step towards that vision: the launch of the Gemini CLI.

For too long, AI coding assistance has often felt like a disconnected assistant. While dedicated AI-powered IDEs like Cursor have made great strides, the common experience still involves copy-pasting code into a separate interface or breaking flow to get suggestions. This breaks flow, loses context, and frankly, isn’t how truly collaborative partners work. We need an AI that lives where we live—in the terminal, within our projects, and deeply integrated into our workflow.

This is precisely what the Gemini CLI sets out to achieve. It’s not just a fancy chatbot for your command line; it’s an experimental interface designed to bring the power of Gemini directly into your development loop, enabling intelligent, contextual, and actionable AI assistance.

It’s for this very reason that I’ve been quite heads-down over the last few months, working with a super talented team to bring this application to life. It has genuinely been one of my most fun experiences at Google in the 20+ years that I’ve been here, and I feel incredibly fortunate to have had the chance to collaborate with such brilliant people across the company.

The Power of Small Tools, Amplified by AI

In May, I explored the concept of small tools, big ideas. The premise was simple: complex problems are often best tackled by composing many small, powerful, and specialized tools. This philosophy is at the very heart of the Gemini CLI’s design.

Instead of a monolithic AI trying to do everything at once, the Gemini CLI empowers Gemini with a suite of familiar command-line tools. Imagine an AI that can:

  • Read and Write Files: Using read_file and write_file, it can inspect your codebase, understand existing logic, and propose modifications directly to your files.
  • Navigate Your Project: With list_directory and grep, it can explore your project structure, locate relevant files, or find specific patterns across your repository, just like you would.
  • Execute Shell Commands: The run_shell_command tool allows Gemini to execute commands, build your project, run tests, or even interact with external services, providing real-time feedback.
  • Search the Web: Need to look up an API, debug an error message, or find best practices? The google_web_search tool lets Gemini leverage the vastness of the internet to inform its responses and actions.
  • Edit with Precision: Beyond simple file writes, the edit_file tool allows for granular, diff-based modifications, ensuring changes are precise and reviewable.

This approach means Gemini isn’t guessing; it’s acting. It’s using the same building blocks you use every day, but with its powerful reasoning capabilities to orchestrate them towards your goals.

A Truly Contextual and Collaborative Partner

The Gemini CLI maintains a persistent session, remembering your conversation history, the files it has examined, and the results of previous tool executions. This “conversational memory” and contextual understanding are critical. It allows for a natural, iterative back-and-forth, where the AI builds on prior interactions and its understanding of your project state.

You can ask Gemini to:

  • “Find all JavaScript files in this directory that import React.” (Leveraging list_directory and grep)
  • “Refactor this component to use hooks.” (Involving read_file, edit_file, and potentially run_shell_command to run tests).
  • “What’s the best way to implement X in Python given these files?” (Using read_file to understand your existing code and google_web_search for best practices).

The workflow is truly interactive. Gemini proposes actions, and you have the power to approve them or guide it further. This human-in-the-loop design ensures you’re always in control, fostering a collaborative partnership rather than a black-box operation.

Built by Gemini CLI, For Everyone

It’s particularly exciting to share that this project was started by a small and scrappy team, and we leveraged Gemini CLI itself to help write Gemini CLI. Many of us now work almost exclusively within Gemini CLI, often using our IDEs only for viewing diffs.

And while its origins are in coding, Gemini CLI is incredibly versatile for many tasks outside of traditional development. Personally, I love using it to manage my home lab, to bulk rename and reformat files for my podcast project, and to generally act as a seamless go-between for anything complicated in GitHub. Increasingly, I’ve also been using Gemini CLI with Obsidian to understand and extract insights from my vault. With over 9000 files in my work vault alone, Gemini CLI lets me ask questions of the entire vault and even make large refactoring-style changes across the entire thing.

Beyond Today: Extensibility

One of the most exciting aspects of the Gemini CLI, and a direct nod to the “small tools, big ideas” philosophy, is its extensibility. The underlying architecture allows developers to define custom tools. This means you can teach Gemini to interact with your specific internal systems, proprietary APIs, or niche development tools. The possibilities are endless, transforming Gemini into an AI assistant perfectly tailored to your unique development environment.

Get Started Today

The Gemini CLI represents a significant leap forward in bringing intelligent AI assistance directly to where developers work most effectively: the command line. It’s a practical realization of the “true AI coding partner” vision, built on the principle that small, well-designed tools can achieve big ideas when orchestrated by a powerful intelligence.

Ready to try it out? Head over to the Gemini CLI GitHub repository to get started. Explore the commands, experiment with its capabilities, and let’s shape the future of AI-powered development together.

I’m incredibly excited about what this means for developer productivity and the evolving role of AI in our daily coding lives. Let me know what you build with it!

Small tools, Big Ideas

It’s a strange time to love simple things. Everywhere I look, the future seems to be rushing toward bigger models, smarter systems, and more complex layers of automation. The story of modern technology is often told as a relentless climb toward more: more intelligence, more capability, more speed. And yet, in the quiet corners of my own work, I keep finding myself drawn back to something much older and simpler. A clean note in a vault. A script with a single, clear purpose. A search box that just works. These small tools, which once felt ordinary, now feel almost radical in their elegance. In a world where everything is getting smarter, I’m finding unexpected joy in the tools that stay beautifully dumb.

Lately, I’ve been thinking a lot about Simon Willison’s llm tool — a little Python utility that gives you a command-line interface for large language models. It doesn’t hide the complexity behind a thousand settings or a shiny UI. It just gives you a simple, direct line to the model, letting you wire it into your workflows however you want. His files-to-prompt tool is another one I admire: an almost absurdly minimal way to push files into a prompt template for LLMs. Both tools feel like reminders that power doesn’t have to mean complexity. Sometimes the most transformative tools are the ones that stay small, sharp, and focused.

This same idea keeps showing up for me in other places too. I’ve been spending more time with tmux lately — not a simple tool in the sense of being easy, but a simple one in its spirit. It doesn’t try to be clever or guess what I want. It gives me a set of building blocks: panes, sessions, terminals — and lets me compose my environment exactly how I like it. Once you internalize its grammar, you realize that you’re no longer fighting your tools. You’re building with them.

In my podcast RAG project, I’ve seen this play out with Whisper too. Whisper isn’t flashy. It’s a humble little engine that quietly turns audio into text, and the latest version is astonishingly good at it. I didn’t need to fine-tune it or coax it into working. I just pointed it at my podcast archive, and it got to work. And it kept working. There’s a kind of magic in that — a tool that doesn’t require worship or endless maintenance, just quiet trust.

The same feeling hit me again recently when I started using uv for Python package management. For years, Python developers have wrestled with slow installs, dependency conflicts, and the occasional cryptic error that turns a five-minute task into a two-hour rabbit hole. uv doesn’t try to paper over those problems with another layer of complexity — it just fixes them. Installs are blindingly fast. Dependency resolution is smart and sane. Virtual environments are first-class citizens, not an afterthought. Using it feels like someone finally rebuilt the foundation without adding a skyscraper on top. It’s one of those tools that makes you wonder how you ever put up with the old way.

Then there’s Ollama, which has completely changed the way I think about local models. Before Ollama, running large language models yourself meant a tangle of Docker containers, custom scripts, GPU configurations, and crossed fingers. Now? You run ollama run, and you’re talking to a model. It’s almost unsettling how easy they’ve made it — not because they hid the power, but because they made a conscious choice to minimize the friction.

And finally, I can’t talk about this new season of rediscovery without mentioning Ghostty. Since it was released in December, Ghostty has become my daily driver for terminals. It doesn’t try to reinvent what a terminal is; it just fixes all the little things that made older terminals frustrating, and it does it with style. Fast, beautiful, reliable. It feels like someone finally sat down and asked: what if we just made this delightful?

When I step back and look at all of these tools — the tiny Python scripts, the old-school multiplexers, the whisper-quiet transcription engines, the frictionless model runners, the sleek terminals, the rebuilt package managers — I realize they all share something in common. They don’t try to be everything. They aren’t built around a fantasy of replacing me. They’re built around the idea of empowering me.

Maybe that’s the real story happening quietly in the margins of our AI-first world.

It’s not just about building bigger models or smarter systems. It’s about rebuilding the foundations — making the tools that carry us forward simpler, faster, sturdier. Tools that invite us to stay close to the work, instead of drifting away from it.

The future won’t be built by magic.

It will be built by people who still care about the foundations.

How Throwaway AI Experiments Lead to Better Code

Over the past few months, I’ve accidentally discovered a new rhythm when coding with AI—and it has reshaped my approach significantly. It wasn’t something I planned or found in a manual. Instead, it emerged naturally through my experiments as I kept noticing consistent patterns whenever I used AI models to build new features. What started as casual exploration has evolved into a trusted process: vibe, vibe again, and then build. Each step plays a distinct role, and together they’ve transformed how I move from a rough idea to functional software.

I first noticed this pattern while developing new features for Gemini Scribe, my Obsidian plugin. I was exploring ways to visualize the file context tree for an upcoming update. Out of curiosity, I gave Cursor an open brief—virtually no guidance from me at all. I simply wanted to see how the model would respond when left entirely to its own devices. I wasn’t disappointed. The model produced a surprisingly creative user interface and intriguing visualization approaches. The first visualization was a modal dialogue showing all files in the tree with a simple hierarchy. It wasn’t ready to ship, but I vividly remember feeling genuine excitement at the unexpected creativity the model demonstrated. The wiring was messy, and there were integration gaps, but it sparked ideas I wouldn’t have reached on my own.

Encouraged by this, I initiated a second round—this time with more structure. I took insights from the initial attempt and guided the model with clearer prompts and a deliberate breakdown of the problem. Again, the model delivered: this time, a new panel on the right-hand side that displayed the hierarchy and allowed users to click directly to any note included in the file context. This feature was genuinely intriguing, closely aligning with the functional design I envisioned. Between these two experiments, I gathered valuable insights on shaping the feature, making it more useful, and improving my future interactions with the model.

These experiences have crystalized into the three phases of my workflow:

  • Max vibe: Completely open-ended exploration to find creative possibilities.
  • Refined vibe: Targeted experimentation guided by learnings from the first round.
  • Build: Structured, focused development leveraging accumulated insights.

The first step—”max vibe and throw away”—is about unleashing the model’s creativity with maximum freedom. No constraints, no polish, just pure experimentation. It’s a discovery phase, surfacing both clever ideas and beautiful disasters. I spend roughly an hour here, take notes, then discard the output entirely. This early output is for exploration, not production.

Next comes “vibe with more detail and throw away again.” Equipped with insights from the initial exploration, I return to the model with a detailed plan, breaking the project into smaller, clearer steps. It’s still exploratory but more refined. This output remains disposable, maintaining fluidity in my thinking and preventing premature attachment to early drafts.

Only after these two rounds do I transition into production mode. At this point, experimentation gives way to deliberate building. Using my notes, I craft precise prompts and break the project into clear, manageable tasks. By now, the route forward is clear and defined. The resulting code, refined and polished, makes it to production, enriched by earlier explorations.

Interestingly, the realization of this workflow was born out of initial frustration. The first time I tried solely prompting for a specific feature set, my codebase became so tangled and problematic that I gave up on fixing it and threw it away entirely. That sense of frustration was pivotal—it highlighted how valuable it is to assume the first two tries would be disposable experiments rather than final products.

Stepping back, this workflow feels more like a structured series of experiments rather than a partnership or conversation. While I appreciate the creative input from AI models, I don’t yet see this approach as the true AI coding partner I envision. Instead, these models currently serve as tools that help me explore possibilities, challenge assumptions, and provide fresh perspectives. I discard many branches along the way, but the journey itself remains immensely valuable.

AI isn’t just writing code—it’s changing how I approach problems and explore solutions. The journey has become as valuable as the destination.

If you’ve been experimenting with AI in your projects, I’d love to hear about your rhythm and discoveries. Have you found your own version of “vibe and build”? Drop me a note—I’d love to learn how others navigate this fascinating new landscape.