Building the Internet of Agents: Overcoming Digital Isolation

Welcome back to The Agentic Shift. Over the past eight installments, we’ve built our agent from the ground up, giving it a brain to think, memory to learn, a toolkit to act, instructions to follow, guardrails for safety, and a framework to build on. But there’s been an elephant in the room this whole time: our agent is alone.

I was sitting at my desk late last night, staring at three different windows on my monitor, feeling like a digital switchboard operator from the 1950s.

In one window, I had Helix, my text editor, where I was writing a Python script. In the second, I had a terminal running a deep research agent I’d built for Gemini CLI. In the third, I had a browser open to a documentation page.

Here’s the thing: Gemini CLI is brilliant, but it’s blind. It couldn’t see the code I had open in Helix. It couldn’t read the documentation in my browser. When it found a critical library update, I had to manually copy-paste the relevant code into the terminal. When I wanted it to understand an error, I had to copy-paste the stack trace. I was the glue, the slow, error-prone, context-losing glue.

We have spent this entire series building a digital Robinson Crusoe. In Part 1, we gave our agent a brain. In Part 4, we gave it tools. But watching my own workflow fragment into disjointed copy-paste loops, I realized we’ve hit a wall. We have built brilliant, isolated sparks of intelligence, but we haven’t built the wiring to connect them.

This fragmentation is the single biggest bottleneck in the agentic shift. But that is changing. We are witnessing the birth of the protocols that will turn these isolated islands into a network. We are moving from building agents to building the Internet of Agents.

The Struggle Before Standards

I tried to fix this myself, of course. We all have. I wrote brittle Python scripts to wrap my CLI tools. I tried building a mega-agent that had every possible API key hardcoded into its environment variables. I even built my own agentic TUI that explored many interesting ideas, but ultimately wasn’t the right solution.

My lowest moment came when I spent several evenings and weekends building an Electron-based AI research and writing application. The vision was grand: a unified workspace where I could query multiple AI models, organize research into projects, and write drafts with AI assistance, all in one window. I built a beautiful sidebar for project navigation, a markdown editor with live preview, a chat interface that could talk to Gemini, and a “sources” panel for managing references. By the time I stepped back to evaluate what I’d built, I had thousands of lines of TypeScript, a complex state management system, and an app that was slower than just using the terminal. Worse, it didn’t actually solve my problem. I still couldn’t get the AI to see what was in my other tools. I’d built a new silo, not a bridge. The repo still sits on my hard drive, unopened.

Every solution felt like a band-aid. The problem wasn’t that I couldn’t write the code; it was that I was trying to solve an ecosystem problem with a point solution.

The Anatomy of Connection

To solve this, we don’t just need “better agents.” We need a common language. The industry is converging on three distinct protocols, each solving a different layer of the communication stack: MCP for tools, ACP for interfaces, and A2A for collaboration.

Why three protocols instead of one? For the same reason the internet isn’t just “one protocol.” Think of it like the networking stack: TCP/IP handles reliable data transmission, HTTP handles document requests, and SMTP handles email. Each layer solves a distinct problem, and trying to collapse them into one mega-protocol would create an unmaintainable mess. The same logic applies here. MCP solves the “how do I use this tool?” problem. ACP solves the “how do I show this to a human?” problem. A2A solves the “how do I collaborate with another agent?” problem. They’re designed to compose, not compete.

The Internal Wiring of MCP

The Model Context Protocol (MCP), championed by Anthropic, represents the agent’s Internal Wiring. It answers the fundamental question: How does an agent perceive, act upon, and understand the world?

It’s easy to dismiss MCP as just “standardized tool calling,” but that misses the architectural shift. MCP creates a universal substrate for context, built on three distinct pillars. First, there are Resources, the agent’s sensory input that allows it to read data (files, logs, database rows) passively. Crucially, MCP supports subscriptions, meaning an agent can “watch” a log file and wake up the moment an error appears. Next are Tools, the agent’s hands, allowing for action: executing a SQL query, hitting an API, or writing a file. Finally, there are Prompts, perhaps the most overlooked feature, which allow domain experts to bake workflows directly into the server. A “Git Server” doesn’t just expose git commit; it can expose a generate_commit_message prompt that inherently knows your team’s style guide and grabs the current diff automatically.

Here is what that “handshake” looks like (from Anthropic’s MCP specification). It’s not magic; it’s a strict contract that turns an opaque binary into a discoverable capability:

{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "result": {
    "tools": [
      {
        "name": "query_database",
        "description": "Execute a SELECT query against the local Postgres instance",
        "inputSchema": {
          "type": "object",
          "properties": {
            "sql": { "type": "string" }
          }
        }
      }
    ]
  }
}

Now, any agent (whether it’s running in Claude Desktop, Cursor, or a custom script) can “plug in” to my Postgres server and immediately know how to use it. It solves the N × M integration problem forever.

A skeptical reader might ask: “How is this different from REST or OpenAPI?” It’s a fair question. On the surface, MCP looks like “JSON-RPC with a schema,” and that’s not wrong. But the difference is what gets standardized. OpenAPI describes how to call an endpoint; MCP describes how an agent should understand and use a capability. The schema isn’t just for validation. It’s for reasoning. An MCP tool description is a prompt fragment that teaches the model when and why to use the tool, not just how.

But here’s where I need to offer some nuance, because protocol boosterism can obscure practical reality.

As Simon Willison observed in his year-end review, MCP’s explosive adoption may have been partly a timing accident. It launched right as models got reliable at tool-calling, leading some to confuse “MCP support” with “tool-calling ability.” More pointedly, he notes that for coding agents, “the best possible tool for any situation is Bash.” If your agent can run shell commands, it can use gh for GitHub, curl for APIs, and psql for databases, no MCP server required.

I’ve felt this myself. When I’m working in Gemini CLI, I rarely reach for an MCP server. The GitHub CLI (gh) is faster and more capable than any MCP wrapper I’ve tried. The same goes for git, docker, and most developer tools with good CLIs.

So when does MCP make sense? I see three clear cases. First, when there’s no CLI (for example with my MCP service for Google Workspace), since many SaaS products expose APIs but no command-line interface. An MCP server is the natural wrapper. Second, when you need subscriptions, since MCP’s ability to “watch” a resource and push updates to the agent is something CLIs can’t do cleanly. Third, when you’re crossing network boundaries, since an MCP server can run on a remote machine and expose capabilities securely, which is harder to orchestrate with raw shell access.

The real insight here is about context engineering. MCP servers bring along a lot of context for every tool (descriptions, schemas, the full capability surface). For some workflows, that richness is valuable. But Anthropic themselves acknowledged the overhead with their Skills mechanism, a simpler approach where a Skill is just a Markdown file in a folder, optionally with some executable scripts. Skills are lightweight and only load when needed. MCP and Skills aren’t competing; they’re different tools for different context budgets.

Giving the Agent a Seat at the Keyboard

If MCP is the agent’s internal wiring, the Agent Client Protocol (ACP) is its window to the world.

I like to think of this as the LSP (Language Server Protocol) moment for the agentic age. Before LSP, if you wanted to support a new language in an IDE, you had to write a custom parser for every single editor. It was a nightmare of N × M complexity. ACP solves the same problem for intelligence. It decouples the “brain” from the “UI.”

This is why the collaboration between Zed and Google is so critical. When Zed announced bring your own agent with Google Gemini CLI integration, they weren’t just shipping features. They were standardizing the interface between the client (the editor) and the server (the agent). Intelligence became swappable. I can run a local Gemini instance through the same UI that powers a remote Claude agent.

The core of ACP is Symmetry. It’s not just the editor sending prompts to the agent. Through ACP, an editor like Zed (the reference implementation) can tell the agent exactly where your cursor is, what files you have open, and even feed it the terminal output from a failed build. The agent, in turn, can request to edit a specific line or show you a diff for approval.

I’ve been seriously thinking about building ACP support for Obsidian. I already built Gemini Scribe, an agent that lives inside Obsidian for research and writing assistance, but it’s hardcoded to Gemini. With ACP, I could make Obsidian a universal agent host, letting users bring whatever intelligence they prefer into their knowledge management workflow.

This turns the editor into the ultimate guardrail. Because the agent communicates its intent through a standardized protocol, the editor can pause, show the user exactly what’s about to happen, and wait for that “Approve” click. It’s the infrastructure that makes autonomous coding safe.

But the real magic isn’t just safety; it’s ubiquity. ACP liberates the agent from the tool. It means you can bring your preferred intelligence to whatever surface helps you flow. We are already seeing the ecosystem explode beyond just Zed.

For the terminal die-hards, there is Toad, a framework dedicated entirely to running ACP agents in a unified CLI. And for the VIM crowd, the CodeCompanion project has brought full ACP support to Neovim. This is the promise of the protocol: write the agent once, and let the user decide if they want to interact with it in a modern GUI, a raw terminal, or a modal editor from the 90s. The intelligence remains the same; only the glass changes.

When Agents Meet Strangers

Finally, we have the “Internet” layer: Agent-to-Agent (A2A).

While MCP connects an agent to a thing, and ACP connects an agent to a person, A2A connects an agent to society. It addresses the “lonely agent” problem by establishing a standard for horizontal, peer-to-peer collaboration.

This protocol, pushed forward by Google and the Linux Foundation, introduces a profound shift in how we think about distributed systems: Opaque Execution.

In traditional software, if Service A talks to Service B, Service A needs to know exactly how to call the API. In A2A, my agent doesn’t care about the how; it cares about the goal. My “Travel Agent” can ask a “Calendar Agent” to “find a slot for a meeting,” without knowing if that Calendar Agent is running a simple SQL query, consulting a complex rules engine, or even asking a human secretary for help.

This negotiation happens through the Agent Card, a machine-readable identity file hosted at a standard /.well-known/agent.json endpoint. It solves the “Theory of Mind” gap, allowing one agent to understand the capabilities of another. Here’s what one looks like:

{
  "name": "Calendar Agent",
  "description": "Manages scheduling, finds available slots, and coordinates meetings across time zones.",
  "url": "https://calendar.example.com",
  "version": "1.0.0",
  "capabilities": {
    "streaming": true,
    "pushNotifications": true
  },
  "skills": [
    {
      "id": "find-meeting-slot",
      "name": "Find Meeting Slot",
      "description": "Given a list of participants and constraints, finds optimal meeting times.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "participants": { "type": "array", "items": { "type": "string" } },
          "duration_minutes": { "type": "integer" },
          "preferred_time_range": { "type": "string" }
        }
      }
    }
  ],
  "authentication": {
    "schemes": ["oauth2", "api_key"]
  }
}

When my Travel Agent encounters a scheduling problem, it doesn’t need to know how the Calendar Agent works internally. It reads this card, understands the agent can “find meeting slots,” and delegates the task. The Calendar Agent might use Google Calendar, Outlook, or a custom database. My agent doesn’t care.

But the real breakthrough is the Task Lifecycle. A2A tasks aren’t just request-response loops; they are stateful, modeled as a finite state machine with well-defined transitions:

Submitted: The task has been received but work hasn’t started.
Working: The agent is actively processing the request.
Input-Required: The agent needs clarification before continuing. This is the key innovation: the agent can pause, ask “Do you prefer aisle or window?”, and wait indefinitely.
Completed: The task finished successfully.
Failed: Something went wrong. The response includes an error message and optional retry hints.
Canceled: The requesting agent (or human) aborted the task.

This state machine brings the asynchronous, messy reality of human collaboration to the machine world. A task might sit in Input-Required for hours while waiting for a human to respond. It might transition from Working to Failed and back to Working after a retry. The protocol handles all of this gracefully.

Finding Agents You Can Trust

But let’s not declare victory just yet. We are seeing the very beginning of this shift, and the “Internet of Agents” brings its own set of dangers.

As we move from tens of agents to millions, we face a massive Discovery Problem. In a global network of opaque execution, how do you find the right agent? And more importantly, how do you trust it?

It’s not enough to just connect. You need safety guarantees. You need to know that the “Travel Agent” you just hired isn’t going to hallucinate a non-refundable booking or, worse, exfiltrate your credit card data to a malicious third party.

This is the focus of recent research on multi-agent security, which highlights that protocol compliance is only the first step. We need mechanisms for Behavioral Verification, ensuring that an agent does what it says it does.

What does verification look like in practice? Today, it’s mostly manual and ad-hoc. You might:

Audit the agent’s logs to see what actions it actually took versus what it claimed.
Run it in a sandbox with fake data before trusting it with real resources.
Require human approval for high-stakes actions (the “Human-in-the-Loop” pattern we explored in Part 6).
Check reputation signals: who built this agent? What’s their track record?

But these are stopgaps. The dream is automated verification: cryptographic proofs that an agent behaved according to its advertised policy, or sandboxed execution environments that can mathematically guarantee an agent never accessed unauthorized data. We’re not there yet.

Whether the solution looks like a decentralized “Web of Trust” (where agents vouch for each other, like PGP key signing) or a centralized “App Store for Agents” (where a trusted authority vets and signs off on agents) remains to be seen. My bet is we’ll see both: curated marketplaces for enterprise use cases, and open registries for the long tail. But solving the discovery and safety problem is the only way we move from a toy ecosystem to a production economy.

The Foundation of the Future

What excites me most isn’t just the code. It’s the governance.

We have seen this movie before. In the early days of the web, proprietary browser wars threatened to fracture the internet. We risked a world where “This site only works in Internet Explorer” became the norm. We avoided that fate because of open standards.

The same risk exists for agents. We cannot afford a future where an “Anthropic Agent” refuses to talk to an “OpenAI Agent” that won’t talk to a “Google Agent.”

That is why the formation of the Agentic AI Foundation by the Linux Foundation is the most important news you might have missed. By bringing together AI pioneers like OpenAI and Anthropic alongside infrastructure giants like Google, Microsoft, and AWS under a neutral banner, we are ensuring that the “Internet of Agents” remains open. This foundation will oversee the development of protocols like A2A, ensuring they evolve as shared public utilities rather than walled gardens. It is the guarantee that the intelligence we build today will be able to talk to the intelligence we build tomorrow.

The New Architecture of Work

When we combine these three protocols, the fragmentation dissolves.

Imagine I am back in Zed (connected via ACP). I ask my coding agent to “Add a secure user profile page.” Zed sends my cursor context to the agent. The agent reaches for MCP to query my local database schema and understand the users table. Realizing this touches PII, it autonomously pings a “Security Guardrail Agent” via A2A to review the proposed code. Approval comes back, and my local agent writes the code directly into my buffer.

I didn’t switch windows once.

But what happens when things go wrong? Let’s say the Security Guardrail Agent rejects the code because it detected a SQL injection vulnerability. The A2A task transitions to Failed with a structured error: {"reason": "sql_injection_detected", "line": 42, "suggestion": "Use parameterized queries"}. My local agent receives this, understands the failure, and either fixes the issue automatically or surfaces it to me with context. The rejection isn’t a dead end; it’s a conversation.

Or imagine the MCP server for my database is unreachable. The agent doesn’t just hang. It receives a timeout error and can decide to retry, fall back to cached schema information, or ask me whether to proceed without database context. Robust failure handling is baked into the protocols, not bolted on as an afterthought.

Where We Are Today

I want to be honest about maturity. These protocols are real and shipping, but the ecosystem is young.

MCP is the most mature. Just about everything supports it now: coding tools, virtualization environments, editors, even mobile apps. There are hundreds of community MCP servers for everything from Notion to Kubernetes. If you want to try this today, MCP is the on-ramp.

ACP is newer but moving fast. Zed is the reference implementation, with Neovim (via CodeCompanion) and terminal clients (via Toad) close behind. There are also robust client APIs for many languages, making ACP an interesting interface for controlling local agentic applications. If your editor doesn’t support ACP yet, you’ll likely be using proprietary plugin APIs for now.

A2A is the most nascent. Google and partners announced it in mid-2025, and the specification is still evolving. There aren’t many production A2A deployments yet. Most multi-agent systems today use custom protocols or framework-specific solutions like CrewAI or LangGraph. But the spec is public, the governance is in place, and early adopters are building.

If you’re starting a project today, my advice is: use MCP for tool integration, use whatever your editor supports for the UI layer, and keep an eye on A2A for future multi-agent workflows. The pieces are coming together, but we’re still early.

And yet, this isn’t science fiction. The protocols are here today. The “Internet of Agents” is booting up, and for the first time, our digital Robinson Crusoes are finally getting a radio.

But a radio is only as good as the conversations it enables. In our next post, we’ll move from protocols to practice and explore what happens when agents don’t just connect, but actually collaborate: forming teams, delegating tasks, and solving problems no single agent could tackle alone.

2 thoughts on “When Agents Talk to Each Other”

Why MCP is Key for Non-CLI Agent Integration and Security says:

March 14, 2026 at 1:45 pm

[…] and more capable than any MCP wrapper around the GitHub API. I made exactly this observation in my own post on the Internet of Agents. Simon Willison said as much in his year-end review, noting that for coding agents, “the best […]

Loading...
Exploring the Age of AI Agents: A New Technological Shift says:

January 2, 2026 at 1:55 pm

[…] Part 9: When Agents Talk to Each Other […]

Loading...

Letters from Silicon Valley

In “Letters from Silicon Valley,” I write about the convergence of technology, life, and creativity, sharing insights from my extensive experience in the tech industry along with my personal adventures in woodworking, music, and beyond.

When Agents Talk to Each Other

The Struggle Before Standards

The Anatomy of Connection

The Internal Wiring of MCP

Giving the Agent a Seat at the Keyboard

When Agents Meet Strangers

Finding Agents You Can Trust

The Foundation of the Future

The New Architecture of Work

Where We Are Today

Like this:

Related

2 thoughts on “When Agents Talk to Each Other”

Leave a ReplyCancel reply

The Struggle Before Standards

The Anatomy of Connection

The Internal Wiring of MCP

Giving the Agent a Seat at the Keyboard

When Agents Meet Strangers

Finding Agents You Can Trust

The Foundation of the Future

The New Architecture of Work

Where We Are Today

Share this:

Like this:

Related

2 thoughts on “When Agents Talk to Each Other”

Leave a ReplyCancel reply

Discover more from Letters from Silicon Valley