Welcome back to The Agentic Shift, our shared journey mapping the evolution of AI from passive creator to active partner. So far, we’ve carefully assembled the core components of our agent. We’ve given it senses to perceive its digital world, a brain to think and reason, and a memory to learn and recall.
Our agent is now a brilliant observer, but an observer is all it is. It can understand its environment, formulate complex plans, and remember every detail, but there’s a crucial piece missing. It’s like a master chef who has conceptualized the perfect dish but has no knives to chop or stove to cook. An agent that can only perceive, think, and remember is still trapped in its own mind. To be useful, it must be able to act.
This is where tools come in. Tools are the agent’s hands, allowing it to bridge the gap between its internal reasoning and the external digital world. In this post, we’ll finally step into the workshop and give our agent the ability to interact with its environment. We’ll explore the fundamental loop that governs its actions, the art of crafting a tool it can understand, and the common implements that empower agents to help with everything from coding to scheduling your next meeting.
The Suggestion, Not the Command
Before we break down the loop that governs tool use, we need to internalize the single most important concept in building safe agents: the AI model never executes code directly. This is a bright red line, a fundamental safety principle. When a model “uses a tool,” it isn’t running a program; it’s generating a highly structured piece of text—a suggestion—that our application code can choose to act upon.
Let’s return to our analogy of the master chef. The chef (the LLM) decides it’s time to sear the scallops. They don’t walk over to the stove and turn it on themselves. Instead, they call out to a trusted kitchen assistant (our application code), “Set the front burner to high heat.”
That verbal command is the tool call. It contains a clear intent (set_burner_heat) and specific parameters (burner: 'front', setting: 'high').
It’s the kitchen assistant’s job to interpret this command, walk over to the physical stove, and turn the knob. The assistant then reports back, “The burner is on and heating up.” With this new observation from the outside world, the chef can proceed to the next step in the recipe. The power lies in this clean separation of duties: the chef has the creative intelligence, but the assistant has the hands-on ability to interact with the world. In AI agents, this separation is how we maintain control, security, and reliability. The LLM suggests, and our application executes.
The Four-Step Recipe
At its heart, an agent’s ability to use a tool follows a simple, elegant recipe. It’s a dance between the AI’s brain (the LLM) and the application code that hosts it, a programmatic loop that follows a “Think-Act-Observe” cycle. Because our chef only suggests the next step, the kitchen assistant is always in control of the execution, making the entire process safe and reliable.
This recipe has four key steps:
- Provide Tools and a Prompt: The application gives the LLM the user’s request, but it also provides a “menu” of available tools, complete with detailed descriptions of what each one does.
- Get a Tool Call Suggestion: The LLM analyzes the request and the menu. If it decides a tool is needed, it fills out a structured “order form” (a
FunctionCall) specifying which tool to use and what arguments to provide. - Execute the Tool: Our application receives this order form, validates it, and then—in its own secure workshop—executes the actual function.
- Return the Result: The application takes the result from the tool and hands it back to the LLM, allowing it to synthesize a final, factually grounded answer for the user.
This loop transforms the agent from a pure conversationalist into a system that can take concrete, observable actions to achieve a goal.
When the Recipe Goes Wrong
The four-step recipe describes the ideal path, but in the real world, kitchens are messy. What happens when the kitchen assistant tries to light the stove and the gas is out? A good assistant doesn’t just stop; they report the problem back to the chef.
This is the essence of error handling in AI agents. If our application tries to execute a tool and it fails—perhaps an external API is down or a file isn’t found—it’s crucial that it doesn’t crash. Instead, it should catch the error and pass a clear, descriptive error message back to the model as the “observation” in the next step of the loop.
When the LLM receives an error message (e.g., “Error: API timed out”), it can use its reasoning ability to decide what to do next. It might suggest retrying the tool, trying a different one, or simply informing the user that it can’t complete the request. This is what transforms an agent from a fragile automaton into a resilient problem-solver.
The Multi-Tasking Chef
As agents become more sophisticated, so does their ability to multitask. Modern LLMs can suggest calling multiple tools at the same time, like a master chef telling an assistant, “Start searing the scallops and begin chopping the parsley.”
If a user asks a complex question like, “What’s the weather in London and what’s the top news story in Paris?” a capable agent can recognize these are two separate, independent tasks. In its “order form,” it can list two tool calls: one for the weather API and one for a news API. Our application can then execute these two calls concurrently. This parallel execution is far more efficient than a one-at-a-time approach, leading to faster responses and a more fluid user experience.
Following the Whole Recipe
The true power of an agent is revealed when it moves beyond single commands and starts executing a whole recipe, one step at a time. This is called tool chaining, and it’s how agents tackle complex, multi-step tasks. The core idea is simple: the output from one tool becomes the input for the next.
This is achieved by running the “Think-Act-Observe” loop multiple times. Consider a request like, “Find the latest project update email from Jane, summarize it, and send a ‘Got it, thanks!’ reply.” An agent would tackle this by chaining tools together:
- Loop 1: The agent first calls the
search_emailtool with the query “latest project update from Jane.” The tool returns the full text of the email. - Loop 2: With the email content now in its context, the agent’s next thought is to summarize it. It calls a
summarize_texttool, passing in the email’s content. The tool returns a concise summary. - Loop 3: Now, holding the summary, the agent knows it just needs to confirm receipt. It calls the
send_emailtool, with the recipient set to Jane and the body as “Got it, thanks!”
This ability to chain actions—where each step informs the next—is what elevates an agent from a simple command-executor to a true problem-solver.
Giving the Agent a Menu
An agent can only use the tools it understands. This is why defining a tool correctly is one of the most critical aspects of building a reliable agent. A well-defined tool is a contract between our code and the model’s reasoning, and it has three parts:
- Name: A unique, simple identifier, like
get_current_weather. - Description: A clear, natural language explanation of what the tool does. This is the most important part, as the description is effectively the prompt for the tool; a vague description will lead to a confused agent.
- Schema: A rigid, machine-readable contract that defines the exact parameters the tool needs, their data types, and which ones are required. This schema is a powerful guardrail against the model “hallucinating” or inventing parameters that don’t exist.
Let’s look at a practical example from the Google Gemini API, defining a tool to get the weather.
# Based on official Google AI for Developers Documentation
from google import genai
from google.genai import types
# Define the function declaration for the model
weather_function = {
"name": "get_current_temperature",
"description": "Gets the current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city name, e.g. San Francisco",
},
},
"required": ["location"],
},
}
# Configure the client and tools
# The 'client' part is a placeholder for your actual Gemini client.
tools = types.Tool(function_declarations=[weather_function])
config = types.GenerateContentConfig(tools=[tools])
# Send request with function declarations
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What's the temperature in London?",
config=config,
)
# Check for a function call
if response.candidates[0].content.parts[0].function_call:
function_call = response.candidates[0].content.parts[0].function_call
print(f"Function to call: {function_call.name}")
print(f"Arguments: {function_call.args}")
# In a real app, you would call your function here:
# result = get_current_temperature(**function_call.args)
else:
print("No function call found in the response.")
print(response.text)
You can find this complete example in the official documentation. Here, we define the tool’s contract—its name, description, and schema—using a standard Python dictionary. This is then passed to the model, which can now intelligently decide when and how to ask for the temperature. The final part of the example shows how our application can inspect the model’s response to see if it suggested a function call. If it did, our code can then take that suggestion and execute the real-world function, completing the loop.
The Mechanic’s Workbench
So, how many tools can an agent have? While modern models like Gemini 2.5 Flash have enormous context windows that can technically accommodate hundreds of tool definitions, this is misleading. The true bottleneck isn’t just about fitting the tools into the context; it’s about the model’s ability to reason effectively when faced with an overwhelming number of choices.
Think of it like a mechanic’s workbench. A massive bench can hold every tool imaginable, but if it’s cluttered with dozens of nearly identical wrenches, the mechanic will waste time and is more likely to grab the wrong one. The problem isn’t a lack of space; it’s the cognitive load of making the right choice. A smaller, well-organized bench with only the tools needed for the job is far more efficient.
Giving a model too many tools creates a similar “paradox of choice,” leading to ambiguity and reduced accuracy. The signal of the correct tool can get lost in the noise of all the possible tools. The best practice is to be judicious. Equip your agent with a focused set of high-quality, distinct tools relevant to its core purpose.
Tools of the Trade
While you can create a tool for almost any API, a few common patterns have emerged, often tailored to the type of work the agent is designed for.
For the Coder
Agents designed to help with software development need tools to interact with their environment just like a human developer would. These include File System Tools (read_file, write_file) to read and modify code, and the powerful Shell Command (execute_shell_command), which allows an agent to run terminal commands to do things like install dependencies or run tests.
For the Knowledge Worker
Agents for productivity and office work focus on automating communication, scheduling, and information retrieval. Common tools include an Email tool to send messages, a Calendar tool to parse fuzzy requests and create events, and a Document Search tool to perform semantic search over a private knowledge base.
Building Trust
As we grant agents more powerful tools—from sending emails to executing code—building in layers of trust and safety becomes paramount. Giving an agent the key to a powerful tool without oversight is like hiring a new kitchen assistant and not checking their work. Three critical patterns for building this trust are:
- Sandboxing: This ensures that powerful tools run in a secure, isolated environment where they can’t affect the broader system.
- The Human-in-the-Loop: This pattern requires user confirmation for sensitive actions. The agent formulates a plan and waits for a “yes” before executing it.
- Policy Engines: This is a more automated form of oversight—a set of programmable rules the application checks before executing a tool, like a policy that an agent is never allowed to delete more than five files at a time.
These concepts are so foundational to building responsible agents that we’ll be dedicating a future post in this series, Part 6: Putting Up the Guardrails, to a much deeper exploration.
From Queries to Workflows
Taking a step back, it’s worth reflecting on the profound shift that tools represent. For years, our primary interaction with large language models was conversational—we would ask a question, and the model would answer using the vast but static knowledge it was trained on. It was a dialogue.
Tools are the mechanism that transforms this dialogue into a workflow. They are what allow us to move from asking “How do I book a flight from SFO to JFK?” to simply saying, “Book me a flight from SFO to JFK.” The focus shifts from information retrieval to task completion. This is the essence of the agentic shift: moving from a model that knows to an agent that does.
Conclusion
Tools are the bridge between the agent’s mind and the world. They are what allow it to move from reasoning to acting, transforming it from a passive information source into an active partner. The simple, secure loop of “Think-Act-Observe” is the engine that drives this process, and the clarity with which we define an agent’s tools is what determines its reliability.
Now that our agent has a body, a brain, a memory, and hands, the next question is: how do we guide its behavior? How do we write the high-level instructions that shape its personality, its goals, and the constraints under which it operates? That’s the art of the system prompt, and it’s exactly where we’re heading in Part 5: Guiding the Agent’s Behavior. The foundation is laid; now we can start to bring our agent to life.

