Every year, like clockwork, the first assignment back at school was the same: a short essay on what you did over the summer. It was a ritual of sorts, a gentle reentry into the world of homework and deadlines, usually accompanied by a gallery of crayon drawings of camping trips and beach outings.
My summer had all the makings of a classic entry. There was a trip to Australia and Fiji. I could write about the impossible blue of the water in the South Pacific, or the iconic silhouette of the Sydney Opera House against a setting sun. I have the photos to prove it. It was, by all accounts, a proper vacation.
But if I’m being honest, my most memorable trip wasn’t to a beach or a city. It was a two-week detour into the heart of my own code, building something that had been quietly nagging at me for months. While my family slept and the ocean hummed outside our window, I was on a different kind of adventure: one that took place entirely on my laptop, fueled by hotel coffee and a persistent idea I couldn’t shake. I was building an agent for Gemini Scribe.
The Genesis of an Idea
So why spend a vacation hunched over a keyboard? Because an idea was bothering me. The existing chat mode in Gemini Scribe was useful, but it was fundamentally limited. It operated on a simple, one-shot basis: you’d ask a question, and it would give you an answer. It was a powerful tool for quick queries or generating text, but it wasn’t a true partner in the writing process. It was like having a brilliant research assistant who had no short-term memory.
My work on the Gemini CLI was a huge part of this. As we described in our announcement post, we built the CLI to be a powerful, open-source AI agent for developers. It brings a conversational, tool-based experience directly to the terminal, and it’s brilliant at what it does. But its success made me wonder: what would an agent look like if it wasn’t built for a developer’s terminal, but for a writer’s notebook?
I imagined an experience that was less about executing discrete commands and more about engaging in a continuous, creative dialogue. The CLI is perfect for scripting and automation, but I wanted to build an agent that could handle the messy, iterative, and often unpredictable process of thinking and writing. I needed a sandbox to explore these ideas—a place to build and break things without disrupting the focused, developer-centric mission of the Gemini CLI.
Gemini Scribe was the perfect answer. It was my own personal lab. I wanted to be able to give it complex, multi-step tasks that mirrored how I actually work, like saying, “Read these three notes, find the common themes, and then use that to draft an outline in this new file.” With the old system, that was impossible. I was the human glue, copying and pasting, managing the context, and stitching together the outputs from a dozen different prompts. The AI was smart, but it couldn’t act.
It was this friction, this gap between what the tool was and what it could be, that I couldn’t let go of. It wasn’t just about adding a new feature; it was about fundamentally changing my relationship with the software. I didn’t want a tool I could command; I wanted a partner I could collaborate with. And so, with the Pacific as my backdrop, I started to build it.
A Creative Detour in Paradise
This wasn’t a frantic sprint. It was the opposite: a project defined by having the time and space to explore. Looking back at the commit history from July is like re-watching a time-lapse of a building being constructed, but one with very civilized hours. The work began in earnest on July 7th with the foundational architecture, built during the quiet early mornings in our Sydney hotel room while my family was still asleep.

By July 11th, the project had found its rhythm. That was the day the agent got its hands, with the first real tools like google_search and move_file. I remember a focused afternoon of debugging, patiently working through the stubborn formatting requirements of the Google AI SDK’s functionDeclarations. There was no rush, just the satisfying puzzle of getting it right.
Much of the user experience work happened during downtime. From a lounge chair by the beach in Fiji on July 15th, I implemented the @mention system to make adding files to the agent’s context feel more natural. I built a collapsible context panel and polished the session history, all with the freedom to put the laptop down whenever I got tired or frustrated.

Of course, some challenges required deeper focus. On July 16th, I had to build a LoopDetector—a crucial safety net to keep the agent from getting stuck in an infinite execution cycle. I remember wrestling with that logic while looking out over the ocean, a surreal but incredibly motivating environment. The following days were spent calmly adding session-level settings and permissions.
The final phase was about patiently testing and documenting. I wrote dozens of tests, updated the README, and fixed the small bugs that only reveal themselves through use. It was the process of turning a fun exploration into a polished, reliable feature. The first time I gave it a truly complex task—and watched it work, step-by-step, without a single hiccup—was the “aha!” moment. It felt like magic, born not from pressure, but from possibility.
What Agent Mode Really Is
So, what did all that creative exploration actually create? Agent Mode is a persistent, conversational partner for your writing. Instead of a one-off command, you now have a continuous session where the AI remembers what you’ve discussed and what it has done. It’s a research assistant and a writing partner rolled into one.
You can give it high-level goals, and it will figure out the steps to get there. It uses its tools to read your notes, search the web for new information, and even edit your files directly. When you give it a task, you can see its plan, watch it execute each step, and see the results in real-time.
It’s the difference between asking a librarian for a single book and having them join you at your table to help you research and write your entire paper. You can ask it to do things like, “Review my last three posts on AI, find the common threads, and draft an outline for a new post that combines those key themes.” Then you can watch it happen, all within your notes.
The Best Souvenirs
In the end, I came back with a tan and a camera roll full of beautiful photos. But the best souvenir from my trip was the one I built myself. For those of us who love to create, sometimes the most restorative thing you can do on a vacation is to find the time and space to build something you’re truly passionate about. It’s a reminder that the most exciting frontiers aren’t always on a map.
Agent Mode is now available in the latest version of Gemini Scribe. I’m incredibly excited about the new possibilities it opens up, and I can’t wait to see what you do with it. Please give it a try, and come join the conversation on GitHub to share your feedback and ideas. I’d love to hear what you think.
[…] months ago, I wrote about building Agent Mode for Gemini Scribe from a hotel room in Fiji. That post ended with a sense of possibility. The agent could read your notes, search the web, and […]