A smart telescope near a window pointing at stars next to a desk with a glowing laptop and a handheld gadget.

Reading List 7

This week’s reading list spans from the outer reaches of the night sky to the inner mechanics of our development environments. I found myself thinking a lot about physical and digital boundaries, whether stargazing through light pollution, sandboxing database state, or trying to understand where the corporate hype around AI token burns and layoffs actually leaves the rest of us.

[article] Our Galaxy Looks Absolutely Stunning in These Award-Winning Dark Sky Photos. Gizmodo’s gallery of award-winning dark sky photography is a breathtaking reminder of what lies beyond our light-polluted horizons. As someone with a casual interest in astronomy, these images make me want to pack up my gear and head out to the desert immediately.

[article] With the Vespera III and Vespera Pro 2, telescope-maker Vaonis unveils its sharpest optics yet. I have been keeping a close eye on Vaonis’s smart telescopes for a while now. Living in an urban area with heavy light pollution, I am highly skeptical of how much actual stargazing I would get done, but that does not stop me from desperately wanting one of these. The optics on the new Vespera III and Vespera Pro 2 look incredibly sharp.

[release] Launch HN: Ardent (YC P26) – Postgres sandboxes in seconds with zero migration. This is a compelling approach to a massive pain point. Live database testing is currently one of the highest hurdles for agentic software and autonomous coding. In my recent work building a scoreboard for Gemini Scribe, I spent a lot of time writing state-based assertions to confirm the agent didn’t nuke sibling files. Doing that for database mutations is infinitely harder without a lightweight sandbox. Ardent’s promise of instant Postgres replicas with zero migration is something I will be testing immediately.

[release] Flipper unveils a Linux-powered networking gadget built for hackers and tinkerers. This sounds like a delightful piece of hardware. I have a Flipper Zero and have thoroughly enjoyed experimenting with it, but this Linux-powered networking gadget looks like it has significantly more practical utility. It is a neat little box built for hackers and tinkerers that actually fits into a standard sysadmin toolkit.

[article] Ubers COO says its getting harder to justify the money spent on AI tokenmaxxing. Uber’s COO is pointing to a growing frustration in enterprise AI. The industry has fallen into a pattern of tokenmaxxing, where companies compete on how many millions of tokens they can burn through. As I discussed when designing the tool budgets for my Gemini Scribe scoreboard, efficiency should be a primary metric. Leaderboards that celebrate massive token usage incentivize sloppy engineering. We should be optimizing for the middle of the distribution, not cheering on the most wasteful implementations.

[article] Samsung’s OLED tech gives the Ferrari Luce a dashboard unlike anything in a car before. The custom displays in the Ferrari Luce are a stunning application of Samsung’s OLED technology. While the vehicle itself is a concept, the underlying display engineering feels like a preview of how we will interact with glass surfaces in the near future. It is a highly impressive piece of design.

[article] Jensen Huang Just Told Every CEO Hiding Behind AI Layoffs to Shut Up. A sharp analysis of the narrative around AI-driven layoffs. Jensen Huang’s blunt perspective cuts through the corporate excuse-making. This digs into the same questions about who benefits from AI disruption in the workforce that I have been wrestling with lately. It is a must-read for anyone trying to understand the macroeconomic reality behind the hype cycle.

A hand-drawn map on a workbench with a half-built mechanical instrument being assembled directly on top of it.

Agents as Building Blocks

There’s a thread running through the last year of my writing and my work, and I didn’t fully see it until now.

Last September, I wrote Full Circle, about going back to building after years of leading teams. I wanted to be in the driver’s seat for what I called the agentic shift. I wanted to feel the code under my fingers again, to be close enough to the technology that I could form my own opinions about where it was going.

Then I spent six months drawing the map. The Agentic Shift was twelve essays on what agents are, how they work, and what it means to build them well: anatomy, memory, tools, guardrails, multi-agent coordination, production readiness. It was a theoretical framework, written while I was getting my hands dirty on the Gemini CLI team.

And then, in January, I wrote Everything Becomes an Agent, the practitioner’s version. Not theory anymore. I’d watched Gemini Scribe grow from a chat window into a full agent. I’d seen the CLI team go from talking about code to writing and executing it. I’d noticed a pattern repeating across every AI project I touched: given enough time, they all converged on the same architecture. Tools. Loops. Policies. Judgment.

The Antigravity SDK is the second agent product I’ve worked on at Google. Gemini CLI was the first, and it’s where I learned what an agent runtime actually needs: a policy engine, a tool pipeline, lifecycle hooks, a trust model that scales from “let me approve every file write” to “here are the guardrails, go handle it.” The SDK is the next step. Taking everything I learned building one agent and making it possible for everyone to build their own.

Today we’re launching the Antigravity SDK in Preview. The official announcement covers the features (what the SDK does, how to install it, what you can build). This post is about the why. Why this SDK, why this design, and why it matters to me.

What Is an Agent SDK, Really

Here’s something I find fascinating: people have wildly different ideas about what “agent SDK” means.

For some, it’s a way to automate the coding agent. You take the AI that already lives inside your IDE (Antigravity, Cursor, Copilot), and you script it. Pipe in a task, get back a diff. The SDK is an extension of your development environment. That’s a legitimate philosophy, and there are good products built on it.

But that’s not what I wanted to build.

To me, an agent SDK gives you an agent that you can incorporate into your software. Not an extension of your IDE. A building block. Something you import into your Python project the same way you’d import a database client or an HTTP library, and then you use it to solve a problem. The agent is a component in your system, not a wrapper around your workflow.

I’ve watched this pattern play out across Gemini Scribe, the Podcast RAG prototype, and a dozen smaller projects. Software that starts as a script, grows a tools array and a while loop, and eventually looks an awful lot like an agent. I wouldn’t claim that every AI project becomes an agent. But the pattern is durable for a huge class of software problems. And if that convergence is real, if a meaningful number of AI applications end up needing tools, memory, judgment, and guardrails, then the SDK should make that convergence frictionless.

The key distinction is this: the agents you build with the Antigravity SDK aren’t extensions of your developer tools, although they can do development work. They’re independent pieces of software that happen to be implemented as agents. They live in your codebase, run on their own, and do real work.

Let me show you what I mean.

Three Agents That Prove the Point

Two of my favorite examples ship with the SDK, and we use both of them on the SDK project itself on a regular basis. They live in the examples directory on GitHub.

The first is the docstring maintenance agent. You point it at a directory, and it audits every Python file for missing or incomplete docstrings, then fixes them, all following the Google Python Style Guide. It knows which tools it’s allowed to use (read files, list directories, edit .py files in the target directory, and nothing else). It has a policy engine that enforces those boundaries. It runs, does its job, and exits.

The second is the documentation maintenance agent. Same idea, different problem: it scans your project’s documentation for staleness, checks it against the current state of the code, and updates what needs updating.

Here’s what I love about these two examples. They’re coding-related tasks, but they aren’t extensions of my IDE. They’re standalone programs. I don’t run them inside my editor. I run them from the command line, or from a CI job, or from a cron schedule. They happen to be implemented as agents because an agent is the right abstraction for “read a bunch of files, reason about their quality, and make targeted edits.” If I’d built these as scripts, I would have ended up writing a brittle classifier full of if/else branches to decide what to fix and how. The agent architecture deletes that complexity.

We use both of these on the SDK project itself. The SDK maintains its own documentation with its own agents. There’s a satisfying recursion to that.

But I want to push the point further, because the SDK isn’t just for coding tasks. Here’s a completely different kind of agent, a personal knowledge graph I wrote that connects to my Workspace MCP server and answers questions about my Drive, Docs, Gmail, and Calendar:

import asyncio

from google.antigravity import Agent, LocalAgentConfig, types
from google.antigravity.utils import interactive


async def main():
    workspace_mcp = types.McpStdioServer(
        command="node",
        args=["/Users/adh/src/workspace/workspace-server/dist/index.js"],
    )
    system_instructions = (
        "You are a Personal Knowledge Graph Agent. Your goal is to help the user "
        "navigate and synthesize information from their Google Workspace "
        "(Drive, Docs, Gmail, Calendar). You can search for documents, "
        "read emails, and check calendar events to answer questions "
        "and help the user connect the dots."
    )
    config = LocalAgentConfig(
        system_instructions=system_instructions,
        mcp_servers=[workspace_mcp],
        capabilities=types.CapabilitiesConfig(
            enabled_tools=types.BuiltinTools.read_only(),
        ),
    )
    async with Agent(config) as agent:
        print("Knowledge Graph Agent ready. Ask me anything about your Workspace.")
        await interactive.run_interactive_loop(agent)


if __name__ == "__main__":
    asyncio.run(main())

This agent has nothing to do with coding. It’s a personal productivity tool that connects to my Google Workspace via MCP and lets me query my own data in natural language. It’s about 20 lines. It’s read-only by design. And it uses the same SDK, the same patterns, the same trust model as the docstring agent.

Three examples, three completely different domains: autonomous code maintenance, documentation upkeep, personal knowledge synthesis. All built with the same building blocks. That’s the vision.

Batteries Included, Layers When You Need Them

When designing this SDK, I kept coming back to one principle: batteries included. I wanted it to be really easy to put together an agent that worked for you. Easy to grow your application when you needed more sophistication. Easy to dive into the internals when the situation required it.

Here’s what a functional agent looks like:

import asyncio

from google.antigravity import Agent, LocalAgentConfig


async def main():
    config = LocalAgentConfig()
    async with Agent(config) as agent:
        response = await agent.chat("What files are in the current directory?")
        print(await response.text())


if __name__ == "__main__":
    asyncio.run(main())

That’s it. About 10 lines of real code. That agent can read files, edit code, run shell commands, search directories, all out of the box. You didn’t have to configure tools, set up a model connection, or wire up a conversation loop. The batteries are included.

But batteries included doesn’t mean batteries only. I designed the API in three layers, and knowing which layer to reach for is part of the design.

Layer 1: Agent. The highest level. Create an agent, give it a prompt, get results. This is where most people start, and many people stay. It manages the full lifecycle (connection, conversation, tools, hooks, policies) in a single async with block. If you just need an agent that does a job, this is your entire API surface.

Layer 2: Conversation. This is the implementation layer. Conversations, hooks, policies, MCP servers, custom tools, structured output. Conversation wraps a Connection with step history, turn tracking, and convenience methods. This is where you shape behavior. You add guardrails through the declarative policy engine. You inject lifecycle hooks, and the SDK gives you three distinct types: Inspect hooks for read-only observability, Decide hooks for policy decisions (allow/deny), and Transform hooks that can modify data in flight. You wire up MCP servers and your own Python functions as tools.

Layer 3: Connection. The lowest level. Connection is the abstract interface for talking to an agent backend. ConnectionStrategy knows how to establish one for a specific runtime. Today, we ship a local connection strategy that runs the agent on your machine. On the roadmap: remote connection strategies that let the same agent code deploy to the cloud without a rewrite.

Here’s the neat thing about this layer. Because Connection is an abstraction, you could conceivably wire up other agent runtimes behind it. We do this internally. We have several different ways of talking to our agent harness, and they all work through the same Connection interface. Your agent code doesn’t know or care which one is running underneath.

The philosophy is: easy to start, easy to grow, easy to go deep. You shouldn’t need to understand the Connection layer to write your first agent. But when you need it, when you’re building something that requires custom streaming, session resumption, or a novel deployment target, it’s there, and it’s a clean abstraction, not a hack.

One detail I’m particularly proud of: the trust model adapts to the deployment context. The base AgentConfig is deny-by-default. It defaults to read-only tools, and if you try to enable write tools or MCP servers without a safety policy, the Agent refuses to start. Enforced at the framework level. LocalAgentConfig takes a different posture. Since it runs on your own machine, it enables every tool, scopes file operations to the workspaces you’ve configured, and gates shell commands behind a user confirmation prompt by default. You’re developing locally; you probably want your agent to actually do things, but you also probably want a chance to look before it runs rm -rf. The trust gradient is baked into the architecture.

Lessons Encoded

If you’ve been following along with my writing, the SDK might feel familiar. That’s intentional.

The twelve-part Agentic Shift wasn’t just an intellectual exercise. It was the blueprint. Every essay mapped a concept that eventually became a feature.

In Everything Becomes an Agent, I wrote: “If you’re writing if/else logic to decide what the AI should do, you might be building a classifier that wants to be an agent.” The SDK takes that literally. You don’t build classifiers, you define tools and let the model decide which ones to use. The complexity moves from branching logic to capability definition.

I wrote about building a “sudoers file for AI”, a permission system for agents. That became the policy engine. policy.allow("view_file"). policy.deny("*"). Declarative, composable, deny-by-default. You express what’s allowed, and the framework enforces it.

I wrote: “The real complexity isn’t in the code; it’s in the trust.” That conviction shaped the hook system. Hooks give you visibility into every tool call, before and after. Policies give you control. Together, they manage the trust relationship between you and the agent. The SDK doesn’t ask you to trust blindly; it gives you the instruments to verify.

And I wrote: “A hammer does nothing unless you swing it. But an agent? An agent can work while you sleep.” That’s the promise. The SDK is the handle.

These aren’t abstract design principles that I reverse-engineered to sound good in a blog post. They’re lessons learned from building Gemini Scribe, from contributing to Gemini CLI, from watching every project I touched converge on the same agentic patterns. I drew the map, I lived the map, and then I got to build the territory.

The Team

I want to be clear about something. I didn’t build this alone.

I did most of the design for the Python SDK (the API surface, the three-layer architecture, the philosophy behind “batteries included”), and a lot of that design came from the writing I’ve been doing this past year. But design is the easy part. The hard part is building something real, and that was a team effort.

A talented group of engineers worked with me on this. On the SDK implementation, on the test infrastructure, on the Go harness underneath that actually runs the agent, on the internal connection strategies, on the MCP bridge, on a hundred decisions that don’t show up in a blog post but absolutely show up in the quality of the software. The SDK exists because of their work, and it’s better than anything I could have built on my own.

Preview, and an Invitation

We’re shipping this as a Preview. Not “1.0.” That’s deliberate.

The API surface will change. We know that. We’ll evolve it based on feedback from you and from our own continued use of the SDK, because we use it too, every day, on the project itself. There are things we haven’t figured out yet. There are patterns we haven’t discovered. That’s the point of a preview: to learn in the open.

So here’s the invitation: build something. Build a documentation bot, a knowledge graph, a CI pipeline agent, a personal assistant. Build something I haven’t imagined. Break something. Tell us what’s missing, what’s awkward, what delights you. File an issue. Open a PR. Argue with us about the API.

Last September, I wrote that I was going back to building because “for a builder, there’s no more exciting place to be.” The Agentic Shift was the map. The SDK is the territory.

Come explore it.

The Antigravity SDK is available now as a Preview. Install it with pip install google-antigravity, read the official announcement for feature details, and find the source on GitHub.

A futuristic glowing notebook on a wooden desk with a cup of coffee and floating geometric shapes.

Reading List 6

This week’s reading list is a mix of high-level theory and low-level pragmatism. I found myself bouncing between the philosophical implications of how we build AI and the immediate satisfaction of writing a good Go component.

[article] The Century-Long Pause in Fundamental Physics. The author argues that physics has stagnated by swapping “ontology-first” theory for mathematical models that merely fit data. This debate perfectly mirrors current machine learning disputes about whether LLMs build internal world models or just pattern-match at scale, which is the open empirical front currently being adjudicated in mechanistic interpretability.

[release] Onyx Has Released a New Remote Page Turner Called Tappy. I wish Amazon would support page turners for their Kindle line. It would be great if they supported a device as delightful as this one.

[blog] The agent principal-agent problem. This is a great look at one of the biggest problems with agentic development: code review. In my open source work, I now use a pattern where I work with an agent to make a change, test it locally, and create a pull request before having another agent review the code. This back-and-forth works well and keeps a good balance of mental state for the codebase and efficiency.

[article] ReMarkable Paper Pure wants to be the only notebook you’ll ever need. I have always liked the reMarkable tablets, but every time I try one I miss having my Kindle library alongside it. Reading and writing are deeply linked for me, which is why I recently got a Kindle Scribe Colorsoft and found it really hits the mark for what I want.

[blog] Just Fucking Use Go. I have been working on a project that has a Go component to it recently. This is the first time I have really started to look at the language, and it inspires me to spend more time with it.

I built my 7MB Full AI Terminal in Rust & Tauri. This is a neat open source AI terminal. It feels similar to Warp but is a lot smaller.

[article] Computer Use Is 45x More Expensive Than Structured APIs. I am not surprised at all by these findings. I think computer use will remain a last resort, and a lot of apps will expose some kind of API for an agent to use instead. My guess is that this eventually becomes the way we automate unmaintained applications that need to fit into an agentic workflow.

A futuristic clockwork mechanism with glowing nodes, representing community collaboration, automated tasks, and precise measurement.

Automation and Measurement: Inside Gemini Scribe 4.8.0

I recently wrapped up the development cycle for Gemini Scribe 4.8.0. Looking back at the ~99 pull requests merged over the last month, the sheer volume of changes is significant. Not only are we shipping major features, but I’m also seeing a steady uptick in contributions from collaborators, an increase in issues filed by the community, and much more activity in our discussion group. Beyond the changelog and community growth, two structural narratives define this release: automation and measurement.

As I discussed in the evolution of Gemini Scribe, the goal has always been to move beyond a simple chat interface. With 4.8.0, we are taking a massive step toward making the agent a true background worker in your vault.

Here is a look at the architecture, the code, and what this release means for the future of our agentic workflows.

The Push for Automation

For a long time, running a complex agent task meant staring at a blocking UI. If you asked the agent to perform deep research or generate an image, you waited.

To solve this, we introduced a unified background execution lane. The new BackgroundTaskManager allows tools like DeepResearchTool and GenerateImageTool to accept a background: true parameter. The agent submits the task, receives an ID immediately, and returns to its turn. You can monitor these tasks in the new Gemini Activity modal, which consolidates background tasks and RAG indexing status into one view.

But unblocking the UI was only half the battle. We wanted to lay the groundwork for an agent that operates in the background. While true autonomy is a spectrum, the first step is moving away from the chat box and into scheduled, asynchronous workflows.

The Scheduled Task Engine

The marquee feature of 4.8.0 is the full task scheduling system. You can now define a task as a markdown file, and the plugin will run it on a cadence as a headless agent session, writing the output back to the vault.

To make this work, we built a ScheduledTaskManager with a 60-second tick loop. Tasks are stored in [state-folder]/Scheduled-Tasks/ with a sidecar JSON file for state. The headless ScheduledTaskRunner mirrors the standard AgentViewTools but auto-approves all tool calls.

We also expanded the schedule grammar. Originally, daily meant “every 24 hours from creation,” which surprised users. Now, you can specify daily@HH:MM and weekly@HH:MM:DAYS, so you can finally tell the agent to run “every weekday at 4:30 PM.”

We also handle missed runs gracefully. On startup, any task with runIfMissed: true that missed its window surfaces in a CatchUpModal.

Right now, this is essentially a highly intelligent cron job. You are still explicitly telling the agent when to run. But this scheduling engine is the foundational infrastructure for what comes next. In the next release, we are introducing Obsidian lifecycle hooks. Instead of just running on a timer, the agent will be able to react to events, triggering workflows when you create a new file, save a note, or modify a project board. That is where we cross the threshold into true ambient AI.

How I Use This in Practice

To give you an idea of what this unlocks, I currently rely on a few specific scheduled workflows:

The Daily Setup: Every afternoon, a scheduled skill runs to prepare my vault for the following day. It looks up my calendar, creates my daily note if it doesn’t exist, and seeds it with my upcoming meetings. It goes a step further by creating individual meeting note entries and building out context notes for the people I’ll be meeting with. When I walk into the office the next morning, my daily note is already prepped and ready to go.

Automated Blog Drafts: I also use this to automate my content pipeline. I have a scheduled skill that monitors my Readwise syncs and automatically generates drafts for my “Reading List” blog posts. Instead of manually curating and formatting these, the agent handles the heavy lifting in the background, leaving me to just review and polish the draft.

If you are worried about the agent running amok in your vault while you aren’t looking, there are several ways to mitigate this. You can limit the tools the agent has access to. If you don’t want it overwriting files, you can simply restrict its write access. Additionally, the agent’s response from any scheduled task is always saved in the Scheduled-Tasks/Runs file, giving you a complete audit log of what the agent had to say during the session.

In my case, I’m automating skills that I’ve been running manually for a while now, and I run my agent in a mode where I let it write and edit files day-to-day. You should set up your tasks to match your own comfort level. You can read more about how to configure this in the Scheduled Tasks Documentation.

Extracting the Agent Loop

To support headless scheduled tasks, I had to refactor how the agent executes tools. Previously, the tool-execution loop was tightly coupled to the UI in AgentViewTools.

I extracted this logic into a UI-agnostic AgentLoop class. AgentViewTools shrank from 386 lines down to 187, becoming a thin adapter over AgentLoop with specific hooks (onToolBatchStart, onToolCallStart, etc.).

// Conceptual extraction of the AgentLoop
export class AgentLoop {
  constructor(private engine: ToolExecutionEngine) {}
  
  async execute(turn: AgentTurn) {
    // Iterative tool execution, removing the recursive stack-depth ceiling
    while (this.hasPendingToolCalls(turn)) {
       // Loop detection, batching, and execution logic lives here
    }
  }
}

This extraction immediately paid dividends, catching bugs that a duplicate headless runner had introduced, and eliminating a recursive stack-depth ceiling on deep tool chains. More importantly, it means scheduled tasks, evals, and the UI all share the exact same execution engine.

Local Models with Ollama and Gemma 4

First-class local-model support is here. By leveraging the ModelApi seam, chat, summarization, rewrite, and agent tool-calling all work against a local Ollama server. You can use any model from Ollama that supports tool calling, though I have personally only tested this extensively with Gemma 4.

In my local evaluation harness, Gemma 4 performed exceptionally well. It is incredibly capable, fast, and handles the agent loop with a level of reliability that makes local-only agentic workflows genuinely viable.

The way I use this right now is as an offline fallback: when I don’t have an internet connection, I switch to Gemma 4 and just keep working. Obviously, running offline means I don’t have access to online-dependent tools like Google Search, Deep Research, or Image Generation. But for synthesizing notes, organizing projects, or drafting content securely, it is incredibly powerful.

In the future, we will be refining the system to allow you to pick the model you want on a per-function basis. This means you’ll be able to route sensitive, local text processing to an offline model while still leveraging cloud models for heavy-lifting tasks like Deep Research or Image Generation when you are connected.

Moving from Guessing to Measuring

As the agent loop gets more complex (handling runaway loop aborts and budget constraints) we can no longer rely on “vibes” to know if a change improved the system.

To solve this, I built a new CLI-driven eval harness (npm run eval) that drives a live Obsidian instance. It captures turns, tool calls, token usage, cache ratios, and cost. Crucially, it measures reliability. By passing --repeat=N, the harness repeats each task to surface flakiness, reporting a pass^k metric. We can now test multi-hop retrieval and loop-trap cyclic references programmatically, ensuring the agent bails cleanly instead of spinning forever.

Right now, the focus for 4.8.0 was getting this infrastructure in place and establishing the beginnings of our eval set. Having the harness is the first step; the next step is building out a robust suite of test cases that reflect real-world vault interactions.

I would love to see contributions from the community for the evals themselves! If you have complex agentic workflows or edge cases you want to ensure remain stable, please submit them. In the next release, we will start publishing the actual eval results and benchmarks directly in the repo so we can transparently track the agent’s performance over time.

What’s Next?

What does this implementation tell us about the future of software engineering and personal knowledge management?

We are seeing a clear shift toward ambient AI. The chat interface is a great starting point, but the true value of an agentic system is its ability to operate asynchronously. While the scheduling engine in 4.8.0 acts as a highly capable cron job, it lays the groundwork for the event-driven lifecycle hooks coming in the next release.

By combining the AgentLoop extraction with asynchronous execution, Gemini Scribe is no longer just a tool you use; it is becoming a system that reacts and works alongside you. When you can rely on a background orchestrator to run your housekeeping routines (like updating changelogs or triaging issues) while you eat dinner, the vault becomes a living, breathing entity. The agent becomes a true extension of your workflow, utilizing the built-in skills we’ve developed entirely in the background.

Gemini Scribe 4.8.0 is a massive architectural leap forward. The code is cleaner, the tests are faster (thanks to a Vitest migration), and the agent is more autonomous than ever.

If you want to dive into the specifics or try out the new scheduling grammar, check out the updated documentation on scheduled tasks.

Let me know what automated tasks you end up building. I’m already finding new ways to let the agent do the heavy lifting while I focus on the work that matters.

A wooden violin with holographic blueprints projecting from it on a workbench.

Reading List 5

Today’s reading list is a mix of cautionary tales about our digital infrastructure and some fascinating glimpses into how AI is changing both software design and human interaction.

[article] GoDaddy Gave a Domain to a Stranger Without Any Documentation. Wow. This is a really chilling story. I’m glad that I don’t use GoDaddy for my domains.

[article] HashiCorp co-founder says GitHub ‘no longer a place for serious work’. GitHub is in a tough situation. If you look at the graphs they published from their April 28th outage you can see that their growth rate is off the charts. Agentic coding has put strains on that infrastructure that no reasonable person or team could have been prepared for, and the result is a degraded experience and customers walking away.

[blog] Letting AI play my game – building an agentic test harness to help play-testing. There is something really satisfying about watching an agent test a product. I’ve been doing this a lot lately with my Gemini Scribe project, which I need to write about at some point.

[blog] How to use Deep Research with the Gemini API. Great writeup on how to use the latest version of the deep research agent. I’ve updated gemini-utils and my Gemini CLI deep research extension for the newest version of deep research as well.

[article] Meet Shapes, the app bringing humans and AI into the same group chats. It’s inevitable that AI is going to start showing up in more settings where people talk to each other.

[article] Statue of a man blinded by a flag put up by Banksy in central London. This seems like the perfect statue for our times.

[article] MIT’s virtual violin offers luthiers a new design tool. One of the things that makes string instruments so complex is that they are an interface between physics and nature. The wood imparts its own characteristics on top of the geometry. This is a neat project from MIT, but to really help luthiers they will also need to be able to model the woods used in these instruments.

[article] Instagram is testing optional ‘AI creator’ labels. I really think the industry has this backwards. We should be creating “human created” labels. We should assume all content is AI unless otherwise stated.

A spotlight shines on a pianist intensely playing a small, worn piano on a large, dark stage.

The Koln Concert and Creative Constraints

This week I was reminded of a story I like to tell, and the value of constraints on creative work. When I’m working, I often set my constraints before I begin. For example, on an old agentic coding project, I set a few constraints: “The orchestration model must be Gemini Flash,” “All tool calls are through sub-agents,” and “Permissions and configurability are at the core of the agentic loop.” From that, I ended up with adh-cli, a policy-aware TUI for working with Gemini that inspired many of the features I worked on in Gemini CLI last year. The project itself is defunct now and not maintained, but the constraints gave me a great way to think about the project and forced creativity in other areas.

We run into constraints in many different ways. Maybe it’s time pressure: How many of you felt like you wrote your best papers 24 hours before they were due? Maybe it’s the environment, like you must integrate with a certain piece of software, or you have to design your system in a certain way. Maybe it’s self-imposed like my example with adh-cli.

Or maybe the constraint is philosophical. Take Mario Zechner’s Pi Agent, for example. In a blog post, Zechner expressed frustration with the bloat of modern AI coding assistants that try to do everything, describing them as “spaceships with 80% unused functionality.” In response, he built Pi around an “anti-framework” philosophy of radical minimalism. He intentionally constrained his default coding agent to just four fundamental tools: read, write, edit, and bash. By stripping away the hidden system prompts and unpredictable context injections, the tool forces developers to be intentional. It proves that you don’t need a massive, opaque framework to build highly capable AI workflows—sometimes, fewer tools create a sharper focus.

Whether it’s a self-imposed architectural rule or an anti-framework philosophy, these software constraints force us out of our default habits and into a space of deliberate, intentional design. Yet, in our day-to-day work, constraints are rarely celebrated. In fact, that is actually how I end up in constraint conversations the most often: people don’t like their constraints because the constraint has been imposed on them externally. They see it as a restriction instead of a way to channel their creativity. To me, a constraint means that we shut down a huge portion of the exploration space. I don’t have to worry about a million different architectural choices because the constraint has made the decision for me. It is incredibly freeing. Whenever I try to help someone turn around their mindset—from fearing or being frustrated by constraints to being excited by them—I inevitably end up telling them the story of Keith Jarrett and the 1975 Köln Concert.

In 1975, a 17-year-old jazz fan named Vera Brandes organized a late-night concert at the Cologne Opera House. She managed to book Keith Jarrett, one of the most notoriously perfectionist jazz pianists of his generation. It was an ambitious undertaking, and almost immediately, it turned into a disaster.

Due to a backstage mix-up, the venue provided the wrong piano. Instead of the premier concert grand Jarrett requested, he was presented with a small rehearsal model. It was horribly out of tune, the pedals stuck, the high notes sounded tinny and harsh, and the bass lacked any resonance. Jarrett, exhausted and suffering from back pain, flat-out refused to play. It was only when Brandes followed him out into the pouring rain and begged him that he relented, taking pity on the teenager. “Never forget,” he told her. “Only for you.”

What happened next is legendary. Forced to play an unplayable instrument, Jarrett had to completely abandon his usual style. Because the high and low registers were awful, he confined his playing strictly to the middle of the keyboard. Because the piano was too quiet to fill the 1,400-seat opera house, he stood up and hammered the keys with immense physical force. To make up for the lack of resonance, he relied on rolling, repetitive, hypnotic rhythmic patterns in his left hand.

He embraced the limitations, and in doing so, he produced absolute magic. The recording, The Köln Concert, went on to become the best-selling solo jazz album in history.

I think about the Köln Concert all the time, especially lately as we navigate the current landscape of Artificial Intelligence and software architecture.

The Bloat of Infinite Resources

In modern software engineering, we are rarely handed a broken piano. We operate in an era of perceived infinite resources. Cloud computing gives us endless horizontal scaling. Context windows for Large Language Models have ballooned from a meager 4K tokens to 1 million or more. If an application is slow or an agent isn’t performing well, the default instinct is to throw more compute, more memory, or a larger model at the problem.

But infinite resources often breed intellectual laziness. When you have a 1-million token context window, you don’t have to think critically about what information actually matters. You just dump the entire codebase or the entire library of documents into the prompt and hope the model figures it out. It’s the equivalent of having a perfect Bösendorfer grand piano and just mashing all the keys at once.

A pragmatic engineering manager might push back here: Developer time is expensive. If I can solve a problem today by dumping an entire codebase into a 1-million token context window, isn’t throwing compute at it just good business?

It’s a fair question, and engineering is always about tradeoffs. But the tools have evolved—building a RAG pipeline doesn’t take a week anymore; with the right utilities, it takes minutes. More importantly, relying on infinite resources often hides long-term costs. When I built adh-cli, I made an explicit tradeoff: by routing everything through tightly scoped sub-agents, I was actually consuming more total tokens than a single massive prompt would use. But because my constraint forced me to use a much cheaper model (Gemini Flash), my bet was that the overall system would be far more cost-effective and resilient. AI doesn’t remove the need for architectural judgment; it exponentially increases it. You have to exercise good judgment to know when throwing compute at a problem is a calculated business decision, and when it’s just masking a fragile design.

The Innovation of Constraints

The most interesting work in AI right now isn’t happening where resources are unlimited. It’s happening at the edges, where constraints are severe.

Take local models, for example. When you’re trying to run an LLM on a consumer laptop or a Raspberry Pi, you don’t have the luxury of a 70-billion parameter model. You are forced to use a smaller, quantized model. This constraint forces you to build better architectures. You can’t rely on the model to “know” everything, so you have to optimize at the edge. Maybe you build robust Retrieval-Augmented Generation (RAG) pipelines. Maybe you implement sophisticated memory retrieval systems to surface exactly the right historical context just-in-time. Or maybe you break complex workflows down into tiny, focused sub-agents, each operating with its own tightly constrained context window. You have to craft highly specific, deterministic prompts.

# Instead of one massive prompt, constraints force modularity
def evaluate_code_chunk(chunk: str, context: dict) -> EvaluationResult:
    """
    A tightly scoped function that uses a small, fast local model
    to evaluate a specific piece of code, rather than dumping
    the whole repo into a massive API call.
    """
    prompt = build_focused_prompt(chunk, context)
    response = local_model.generate(prompt, max_tokens=256)
    return parse_evaluation(response)

Just like Jarrett avoiding the tinny upper register, we learn to avoid the weak points of our tools. We build guardrails. We write cleaner code. We design systems that are elegant because they have to be.

Finding Your Broken Piano

Of course, there is a survivorship bias to the Köln Concert. For every broken piano that produces a masterpiece, there are a hundred broken laptops that just result in missed deadlines. Not all constraints are good constraints. You can’t change the laws of physics, and if a structural limitation is genuinely preventing the work from happening, you have to reevaluate. The goal isn’t to suffer for the sake of suffering. But by starting with strict constraints, you force yourself to explore the boundaries. If you prove a task is impossible under those conditions, you can always loosen the constraints and expand your resources. But if you start with infinite resources, you never learn where those boundaries actually are.

Constraints are not the enemy of creativity; they are its prerequisite. Yes, accepting a severe constraint—especially an external one you didn’t choose—can be incredibly painful in the moment. Keith Jarrett hated his broken piano. He didn’t feel freed; he fought against it until he was forced to adapt. But like exercise or eating your vegetables, the value isn’t in the immediate comfort. It’s about the mindset shift. You accept the constraint to build a muscle, to stay fit, to force yourself to find a new path when the easy one is blocked. When we are stripped of our ideal tools and infinite runways, we are forced to abandon our default habits. Whether it’s the self-imposed design rules of adh-cli, the radical minimalism of Mario Zechner’s Pi Agent, or the physical limitations of a broken rehearsal piano in Cologne, constraints force us into a space of deliberate, intentional action.

If you want to build a truly resilient, innovative system, don’t start with the biggest, most expensive tools available. Start with a broken piano. Artificially constrain your resources. Limit your context window. See what you can achieve with a 7B parameter model instead of a flagship API, or see what happens when you strip your agent’s toolkit down to the bare essentials.

You might just find that the limitations force you to build something far better than you would have otherwise—a system that is elegant not in spite of its constraints, but because of them.

So, look around your current projects. Where are you relying on infinite resources to mask lazy architecture? And more importantly: what constraints have you come across in your own work that felt like a frustrating restriction at first, but turned out to be a blessing in disguise? I’d love to hear your stories.

A split illustration contrasting corporate AI surveillance with independent home computing.

Reading List #4

This week’s reading had a through line I wasn’t expecting. Almost every article circles back to the same question: who actually benefits when AI reshapes an industry? The answer isn’t always the people doing the work.

[article] Tech CEOs Think AI Will Let Them Be Everywhere at Once. All of the articles I’ve seen on these “management intelligence layers” feel very one-sided. The executive gains synthesized information and faster decision-making, but what do the employees get? Do junior and mid-career folks get better mentoring and coaching? I don’t think so. Collapsing the layers might be good for the bottom line, but is it good for people?

[blog] Figma’s woes compound with Claude Design. There is something fascinating about how frontier labs can reset product expectations overnight. The cost of entering new segments keeps dropping, which makes the world uncertain for SaaS companies and startups alike. This feels like a concrete example of the agentic shift playing out in real time.

[blog] DeepSeek V4 – almost on the frontier, a fraction of the price. Open-weight models just continue to improve. Simon Willison’s breakdown highlights the focus on efficiency here, not just raw capability. It may soon be possible to run frontier-class models on high-end home hardware, and that changes everything about who gets access.

[article] This Scammer Used an AI-Generated MAGA Girl to Grift ‘Super Dumb’ Men. We are living in a world where we have to assume that the content we are viewing is AI-generated. I think we should focus our efforts on tools that allow people to certify their content is real rather than trying to watermark AI content. The conversation around AI and creative authenticity is only going to get louder.

[article] I’ve been using “Ask Maps,” and it has forever changed Google Maps for me. I used the new Ask Maps feature extensively on my last trip and it felt like magic. Natural language queries against a map database is exactly the kind of AI application that just works, no prompt engineering required.

[article] You Should Have Exactly 3 Pairs of Headphones. Here’s Why. I’ve come to basically the same conclusion. Beats for workouts, AirPods Pro for every day, and AirPods Max for travel. The right tool for the right job applies to audio gear too.

A cinematic, retro-futuristic illustration of a high-tech developer workspace with a floating command-line interface, AI nodes, and glowing wireless earbuds.

Reading List #3

Today’s reading list is a mix of practical AI implementation, terminal tooling, and a glimpse into the future of human-computer interaction. It’s fascinating to see how quickly the conversation is shifting from “what can AI do?” to “how do we actually use this stuff?”

[article] You can now easily call LLMs from your messaging engine. Should you?. Richard Seroter provides a really nice walkthrough on adding LLMs to Pub/Sub in Google Cloud. It’s a great example of bringing AI directly to the data pipeline.

[tool] Make Tmux Pretty and Usable. Tmux is pretty great, although I prefer Zellij. This article still gives you a bunch of solid tips on making Tmux useful and nice to look at if it’s your multiplexer of choice.

[article] Duolingo CEO Says They’ve Stopped Tracking Employees’ AI Use for Performance Reviews. Employees aren’t stupid. They understand that the adoption of AI and all its ability to increase productivity does nothing for them individually. There is no incentive, and that is why we keep seeing stories like this pop up.

[article] AirPods Pro 3 may let you talk to Siri without actually saying a word. This would be so cool. I remember this concept from the first time I read the Ender’s Game series when the characters could talk with AI systems through subvocalizations.

[article] 8 Tips for Writing Agent Skills. Writing skills is easy, but writing effective skills is much harder. My colleague Philipp has some great advice on how to craft instructions that agents will actually follow, which is a topic I’ve spent a lot of time thinking about recently.

A glowing terminal window overlapping with a polished desktop environment.

Reading List #2

Today’s reading list is dominated by the rapid evolution of AI tooling and the real-world implications of deployed models. It is a reminder that while the underlying models are improving, the interface layer and security guarantees are where the real battles are being fought.

[article] AI images are now being abused to fake evidence for vehicle insurance fraud. We have spent so much time as an industry trying to add watermarks like SynthID to AI generated images, but I think we are looking at this backwards. Instead of trying to mark what is fake, we need to focus on building cryptographic guarantees that prove an image is actually real.

[release] Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All. My feed has been flooded with people talking about this new open weight model and its agentic capabilities. I need to carve out some time this weekend to pull it down and see how it performs in my own local setup, especially as the agentic shift continues to accelerate.

[article] OpenAI’s Big Codex Update Is a Direct Shot At Claude Code. I haven’t spent much time in Codex lately, but this update has some genuinely interesting features. It is fascinating to watch the major players trade blows in the AI coding space, pushing the entire ecosystem forward in the process.

[release] The Gemini App Is Now on Mac. While I spend a lot of my time in the terminal with Gemini CLI, having Gemini as a native desktop experience right on my Mac is a massive quality of life improvement. It keeps you in the flow, and I can’t wait to see where the team takes the integration next.

An overhead view of a wooden desk with a notebook, coffee mug, and phone showing a reading list.

Reading List #1

Two things collided this week. I have been trying to push myself toward a daily posting streak, the kind of constraint that forces you to write before you feel ready. And I have been reading Richard Seroter’s daily reading lists every morning for months, quietly admiring the discipline of the format. Today those two things became one experiment.

So here is the first one. The shape is borrowed shamelessly from Seroter: a short, opinionated tour through whatever caught my attention in the last day or two of reading, mostly sourced from my Readwise pile. Some days the picks will feel coherent. Other days, like today, they will be all over the map. That is part of the point.

[blog] I run multiple $10K MRR companies on a $20/month tech stack. Steve Hanov makes a startlingly good case for SQLite-first, Go over Python, and a $5 VPS instead of AWS. This is such good advice that it is making me seriously rethink how I deploy some of my hobby projects.

[article] Why Weekends Are Under Threat. The framing of the weekend as a network-effect technology is worth the read on its own. I think we have all been feeling this drift. Phones started the trend in some ways, and agents are going to make it worse.

[article] 5G From the Sky, New Internet Infrastructure Takes Flight. Sceye’s stratospheric balloons aim to live in the gap between Starlink and terrestrial cell towers. I recently wrote about my experience with Starlink Mini on a road trip, and I am excited to see real competition emerge in this layer of the stack.

[article] ‘It Feels as if I’ve Made a New Best Friend’, My Experiment With AI Journalling. I have played around with AI journalling inside Obsidian, but I have not tried Mindsera or Rosebud. I like that we are seeing new ways of interacting with AI and text, not just chat windows.

[article] Chrome Now Lets You Turn AI Prompts Into Repeatable ‘Skills’. I think Skills in Chrome is going to be really useful. I have been developing a growing library of Skills for other agents, and I would love to have them available in the browser too.

[blog] Want to Write a Compiler? Just Read These Two Papers (2008). I managed a compiler team once, though I was never a compiler engineer myself. Posts like this make me think it might be time to revisit that space.

[article] California Ghost-Gun Bill Wants 3D Printers To Play Cop, EFF Says. I do not think this kind of legislation can succeed if we use the same model we used with copy machines and currency. 3D printing is a different beast, and it needs different solutions.