GitHub issues transforming into glowing skill cards floating above a laptop screen.

Bundled Skills in Gemini Scribe

The feature that became Bundled Skills started with a GitHub issues page.

I wrote and maintain Gemini Scribe, an Obsidian plugin that puts a Gemini-powered agent inside your vault. Thousands of people use it, and they have questions. People would open discussions and issues asking how to configure completions, how to set up projects, what settings were available. I was answering the same questions over and over, and it hit me: the agent itself should be able to answer these. It has access to the vault. It can read files. Why am I the bottleneck for questions about my own plugin?

So I built a skill. I took the same documentation source that powers the plugin’s website, packaged it up as a set of instructions the agent could load on demand, and suddenly users could just ask the agent directly. “How do I set up completions?” “What settings are available?” The agent would pull in the right slice of documentation and give a grounded answer. The docs on the web and the docs the agent reads are built from the same source. There is no separate knowledge base to keep in sync.

That first skill opened a door. I was already using custom skills in my own vault to improve how the agent worked with Bases and frontmatter properties. Once I had the bundled skills mechanism in place, I started looking at those personal skills differently. The ones I had built for myself around Obsidian-specific tasks were not just useful to me. They would be useful to anyone running Gemini Scribe. So I started migrating them from my vault into the plugin as built-in skills.

With the latest version of Gemini Scribe, the plugin now ships with four built-in skills. In a future post I will walk through how to create your own custom skills, but first I want to explain what ships out of the box and why this approach works.

Four Skills Out of the Box

That first skill became gemini-scribe-help, and it is still the one I am most proud of conceptually. The plugin’s own documentation lives inside the same skill system as everything else. No special case, no separate knowledge base. The agent answers questions about itself using the same mechanism it uses for any other task.

The second skill I built was obsidian-bases. I wanted the agent to be good at creating Bases (Obsidian’s take on structured data views), but it kept getting the configuration wrong. Filters, formulas, views, grouping: there is a lot of surface area and the syntax is particular. So I wrote a skill that guides the agent through creating and configuring Bases from scratch, including common patterns like task trackers and project dashboards. Instead of me correcting the agent’s output every time, I describe what I want and the agent builds it right the first time.

Next came audio-transcription. This one has a fun backstory. Audio transcription was one of the oldest outstanding bugs in the repo. People wanted to use it with Obsidian’s native audio recording, but the results were poor. In this release, fixes around binary file uploads meant the model could finally receive audio files properly. Once that was working, I realized I did not need to write any more code to get good transcriptions. I just needed to give the agent good instructions. The skill guides it through producing structured notes with timestamps, speaker labels, and summaries. It turns a messy audio file into a clean, searchable note, and the fix was not code but context.

The fourth is obsidian-properties. Working with note properties (the YAML frontmatter at the top of every Obsidian note) sounds trivial until you are doing it across hundreds of notes. The agent would make inconsistent choices about property types, forget to use existing property names, or create duplicates. This skill makes it reliable at creating, editing, and querying properties consistently, which matters enormously if you are using Obsidian as a serious knowledge management system.

The pattern behind all four is the same. I watched the agent struggle with something specific to Obsidian, and instead of accepting that as a limitation of the model, I wrote a skill to fix it.

Why Not Just Use the System Prompt

You might be wondering why I did not just shove all of this into the system prompt. I wrote about this problem in detail in Managing the Agent’s Attention, but the short version is that system prompts are a “just-in-case” strategy. You load up the agent with everything it might need at the start of the conversation, and as you add more instructions, they start competing with each other for the model’s attention. Researchers call this the “Lost in the Middle” problem: models pay disproportionate attention to the beginning and end of their context, and everything in between gets diluted. If I packed all four skills worth of instructions into the system prompt, each one would make the others less effective. Every new skill I add would degrade the ones already there.

Skills avoid this entirely. The agent always knows which skills are available (it gets a short name and description for each one), but only loads the full instructions when it actually needs them. When a skill activates, its instructions land in the most recent part of the conversation, right before the model starts reasoning. Only one skill’s instructions are competing for attention at a time, and they are sitting in the highest-attention position in the context window.

There is a second benefit that surprised me. Because skills activate through the activate_skill tool call, you can watch the agent load them. In the agent session, you see exactly when a skill is activated and which one it chose. This gives you something that system prompts never do: observability. If the agent is not following your instructions, you can check whether it actually activated the skill. If it activated the skill but still got something wrong, you know the problem is in the skill’s instructions, not in the agent’s attention. That feedback loop is what lets you iterate and improve your skills over time. You are no longer guessing whether the agent read your instructions. You can see it happen.

Skills follow the open agentskills.io specification, and this matters more than it might seem. We have seen significant standardization around this spec across the industry in 2026. That means skills are portable. If you have been using skills with another agent, you can bring them into Gemini Scribe and they will work. If you build skills in Gemini Scribe, you can take them with you. They are not a proprietary format tied to one tool. They are Markdown files with a bit of YAML frontmatter, designed to be human-readable, version-controllable, and portable across any agent that supports the spec.

What Comes Next

The four built-in skills are just the beginning. When I decide what to build next, I think about skills in four categories. First, there are skills that give the agent domain knowledge about Obsidian itself, things like Bases and properties where the model’s general training is not specific enough. Second, there are skills that help the agent use Gemini Scribe’s own tools effectively. The plugin has capabilities like deep research, image generation, semantic search, and session recall, and each of those benefits from a skill that teaches the agent when and how to use them well. Third, there are skills that bring entirely new capabilities to the agent, like audio transcription. And fourth, there is user support: the help skill that started this whole process, making sure people can get answers without leaving their vault.

The next version of Gemini Scribe will add built-in skills for semantic search, deep research, image generation, and session recall. The skills system is also designed to be extended by users. In a future post I will walk through creating your own custom skills, both by hand and by asking the agent to build them for you.

For now, the takeaway is simple. A general-purpose model knows a lot, but it does not know your tools. When I watched the agent struggle with Obsidian Bases or produce flat transcripts or make a mess of note properties, I could have accepted those as limitations. Instead, I wrote skills to close the gap. The model’s knowledge is broad. Skills make it deep.

A manager, dressed in a suit, stands in the center conducting a "jazz band" composed of three glowing, transparent blue AI agents playing a piano, trumpet, and holding sheet music, with a human bass player in the background. The background is dark blue with subtle lines of code and glowing dots.

The Manager’s Edge in the Age of AI

I was catching up with an old friend last week when I shared a hypothesis that’s been on my mind: people who have experience managing, coaching, or directing others might have a surprising advantage in the age of AI. He encouraged me to write about it, and here we are. The core of the idea is this: there’s a subtle art to getting the best out of people, and I’m beginning to believe the same is true for getting the best out of our AI partners.

It’s a strange disconnect I’m seeing everywhere. For all the buzz, a surprising number of companies are struggling to turn their AI experiments into real, lasting value. The latest McKinsey Global Survey on AI found that while about three-quarters of organizations are using generative AI, many pilots are stalling out. And it turns out, the problem usually isn’t the tech. The Boston Consulting Group (BCG) puts it bluntly with their “10-20-70 rule”: AI success is about 10% algorithms, 20% technology, and a massive 70% people and processes. It’s a leadership challenge, not a technical one. A people problem, not a silicon one.

This brings me back to my hypothesis. My gut tells me that successful managers have a head start in this new world because we’ve been trained—formally or through the hard school of experience—to be ruthlessly clear in our communication. We learn to define what success looks like, provide guardrails for the work, and then guide it with a steady hand through gentle correction and continuous feedback.

I’ve seen this play out as talented developers try to adopt these new tools. Many struggle, but I’ve noticed a recurring pattern: they fail to give the model enough context, get a generic or wrong answer, and walk away thinking the AI isn’t smart enough.

When this generation of LLMs first arrived, it was natural to treat them like a search engine—ask a simple question and expect a perfect answer. But that’s like walking up to a new junior engineer on your team, saying, “Build me a login system,” and expecting a production-ready feature a week later. You’d never do that.

You’d give them architectural documents, point them to existing libraries, explain the security requirements, and set up regular check-ins. You’d provide the context, the constraints, and the success criteria.

This is the heart of the matter. The very same principles apply when working with an AI. A vague prompt like, “Write a blog post on prompt engineering,” will produce a generic, soulless article. But a more “managerial” prompt changes the game entirely: “Synthesize a 2,000-word blog post on prompt engineering using the research and references provided in this document. Here is an outline to follow. Ensure the tone and style match these three writing samples.”

Suddenly, you’re not just a user asking a question. You are a manager setting a clear direction. What we call “prompt engineering” is, in many ways, a new form of management. The research community is even starting to use a more fitting term: “context engineering“—the strategic curation of the entire information environment in which the AI operates.

When you see it this way, the parallels between managing people and directing AI are impossible to ignore. A manager’s ability to articulate a clear vision becomes the skill of crafting a precise prompt. The strategic delegation of tasks becomes the art of defining the AI’s role, deciding which work is best for the model and which needs a human’s creative or empathetic judgment. And the rhythm of performance management—monitoring progress and giving feedback—is a perfect mirror of the iterative AI workflow, where we critically evaluate an output and refine our prompts to get closer to the goal.

This isn’t to say that every developer needs to become a people manager. Of course, many ICs are brilliant communicators, but the daily work of management is a constant exercise in clarity and context-setting. It strongly suggests that the “soft skills” of management—clarity, context-setting, and iterative feedback—are becoming the new essential “hard skills” for the AI-first era. Our job is no longer just to write the code, but to effectively guide the intelligence that will help us write it. We are all becoming managers of a different kind of mind.

This is just the beginning, and it points to a powerful new workflow. As Simon Willison has observed, models are getting remarkably good at writing prompts themselves. This indicates that prompt creation itself is a task perfectly suited for AI. So here is the call to action: before you dive into solving a complex problem, make your first step a collaboration with an AI to build the perfect prompt for the job. This is the next layer of abstraction. We are moving beyond simply giving instructions to strategizing with our AI partners about what the best instructions should be. The core managerial skill of setting a clear, high-level mission remains—only now, we’re applying it to the meta-task of designing the conversation itself.

Gemini Scribe Supercharged: A Faster, More Powerful Workflow Awaits

It’s been a little while since I last wrote about Gemini Scribe, and that’s because I’ve been deep in the guts of the plugin, tearing things apart and putting them back together in ways that make the whole experience faster, smoother, and just plain better.

One of the first things that pushed me back into the code was the rhythm of the interaction itself. Every time I typed a prompt and hit enter, I found myself waiting—watching the spinner, watching the time pass, watching the thought in my head cool off while the AI gathered its response. It didn’t feel like a conversation. It felt like submitting a form.

That’s fixed now. As of version 2.2.0, Gemini Scribe streams responses in real-time. You see the words as they’re generated, line by line, without the long pause in between. It makes a difference. The back-and-forth becomes more fluid, more natural. It pulls you into the interaction rather than holding you at arm’s length. And once I started using it this way, I couldn’t go back.

But speed was only part of it. I also wanted more control. I’ve been using custom prompts more and more in my own workflow—not just as one-off instructions, but as reusable templates for different kinds of writing tasks. And the old prompt system, while functional, wasn’t built for that kind of use.

So I rewrote it.

Version 3.0.0 introduces a completely revamped custom prompt system. You can now create and manage your prompts right from the Command Palette. That means no more hunting through settings or copying from other notes—just hit the hotkey, type what you need, and move on. Prompts are now tracked in your chat history too, so you can always see exactly what triggered a particular response. It’s a small thing, but it brings a kind of transparency to the process that I’ve found surprisingly useful.

All of this is sitting on top of a much sturdier foundation than before. A lot of the internal work in these recent releases has been about making Gemini Scribe more stable and more integrated with the rest of the Obsidian ecosystem. Instead of relying on low-level file operations, the plugin now uses the official Obsidian APIs for everything. That shift makes it more compatible with other plugins and more resilient overall. The migration from the old system happens automatically in the background—you shouldn’t even notice it, except in the way things just work better.

There’s also a new “Advanced Settings” panel for those who like to tinker. In version 3.1.0, I added dynamic model introspection, which means Gemini Scribe now knows what the model it’s talking to is actually capable of. If you’re using a Gemini model that supports temperature or top-p adjustments, the plugin will surface those controls and tune their ranges appropriately. Defaults are shown, sliders are adjusted per-model, and you get more precise control without the guesswork.

None of these changes happened overnight. They came out of weeks of using the plugin, noticing friction, and wondering how to make things feel lighter. I’ve also spent a fair bit of time fixing bugs, adding retry logic for occasional API hiccups, and sanding off the rough edges that show up only after hours of use. This version is faster, smarter, and more comfortable to live in.

There’s still more to come. Now that the architecture is solid and the foundation is in place, I’m starting to explore ways to make Gemini Scribe even more integrated with your notes—tighter context handling, more intelligent follow-ups, and better tools for shaping long-form writing. But that’s a story for another day.

For now, if you’ve been using Gemini Scribe, update to the latest version from the community plugins tab and try out the new features. And if you’ve got ideas, feedback, or just want to follow along as things evolve, come join the conversation on GitHub. I’d love to hear what you think.

How Throwaway AI Experiments Lead to Better Code

Over the past few months, I’ve accidentally discovered a new rhythm when coding with AI—and it has reshaped my approach significantly. It wasn’t something I planned or found in a manual. Instead, it emerged naturally through my experiments as I kept noticing consistent patterns whenever I used AI models to build new features. What started as casual exploration has evolved into a trusted process: vibe, vibe again, and then build. Each step plays a distinct role, and together they’ve transformed how I move from a rough idea to functional software.

I first noticed this pattern while developing new features for Gemini Scribe, my Obsidian plugin. I was exploring ways to visualize the file context tree for an upcoming update. Out of curiosity, I gave Cursor an open brief—virtually no guidance from me at all. I simply wanted to see how the model would respond when left entirely to its own devices. I wasn’t disappointed. The model produced a surprisingly creative user interface and intriguing visualization approaches. The first visualization was a modal dialogue showing all files in the tree with a simple hierarchy. It wasn’t ready to ship, but I vividly remember feeling genuine excitement at the unexpected creativity the model demonstrated. The wiring was messy, and there were integration gaps, but it sparked ideas I wouldn’t have reached on my own.

Encouraged by this, I initiated a second round—this time with more structure. I took insights from the initial attempt and guided the model with clearer prompts and a deliberate breakdown of the problem. Again, the model delivered: this time, a new panel on the right-hand side that displayed the hierarchy and allowed users to click directly to any note included in the file context. This feature was genuinely intriguing, closely aligning with the functional design I envisioned. Between these two experiments, I gathered valuable insights on shaping the feature, making it more useful, and improving my future interactions with the model.

These experiences have crystalized into the three phases of my workflow:

  • Max vibe: Completely open-ended exploration to find creative possibilities.
  • Refined vibe: Targeted experimentation guided by learnings from the first round.
  • Build: Structured, focused development leveraging accumulated insights.

The first step—”max vibe and throw away”—is about unleashing the model’s creativity with maximum freedom. No constraints, no polish, just pure experimentation. It’s a discovery phase, surfacing both clever ideas and beautiful disasters. I spend roughly an hour here, take notes, then discard the output entirely. This early output is for exploration, not production.

Next comes “vibe with more detail and throw away again.” Equipped with insights from the initial exploration, I return to the model with a detailed plan, breaking the project into smaller, clearer steps. It’s still exploratory but more refined. This output remains disposable, maintaining fluidity in my thinking and preventing premature attachment to early drafts.

Only after these two rounds do I transition into production mode. At this point, experimentation gives way to deliberate building. Using my notes, I craft precise prompts and break the project into clear, manageable tasks. By now, the route forward is clear and defined. The resulting code, refined and polished, makes it to production, enriched by earlier explorations.

Interestingly, the realization of this workflow was born out of initial frustration. The first time I tried solely prompting for a specific feature set, my codebase became so tangled and problematic that I gave up on fixing it and threw it away entirely. That sense of frustration was pivotal—it highlighted how valuable it is to assume the first two tries would be disposable experiments rather than final products.

Stepping back, this workflow feels more like a structured series of experiments rather than a partnership or conversation. While I appreciate the creative input from AI models, I don’t yet see this approach as the true AI coding partner I envision. Instead, these models currently serve as tools that help me explore possibilities, challenge assumptions, and provide fresh perspectives. I discard many branches along the way, but the journey itself remains immensely valuable.

AI isn’t just writing code—it’s changing how I approach problems and explore solutions. The journey has become as valuable as the destination.

If you’ve been experimenting with AI in your projects, I’d love to hear about your rhythm and discoveries. Have you found your own version of “vibe and build”? Drop me a note—I’d love to learn how others navigate this fascinating new landscape.

Prompts are Code: Treating AI Instructions Like Software

There was a moment, while working on Gemini Scribe, when I realized something: my prompts weren’t just configurations, they were the core logic of my AI application. That’s when I understood that prompts are, fundamentally, code. The future of AI development isn’t just about sophisticated models; it’s about mastering the art of prompt engineering. And that starts with understanding one crucial fact: prompts are code. Today’s AI applications are evolving beyond simple interactions, often relying on a complex web of prompts working together. As these applications grow, the interdependencies between prompts can become difficult to manage, leading to unexpected behavior and frustrating debugging sessions. In this post, I’ll explore why you should treat your prompts as code, how to structure them effectively, and share examples from my own projects like Gemini Scribe and Podcast RAG.

AI prompts are more than just instructions; they’re the code that drives the behavior of your applications. My recent work on the Gemini Scribe Obsidian plugin highlighted this. I initially treated prompts as user configuration, easily tweaked in the settings. I thought that by doing this I would be giving users the most flexibility to use the application in the way that best met their needs. However, as Gemini Scribe grew, key features became unexpectedly dependent on specific prompt phrasing. I realized that if someone changed the prompt without thinking through the dependencies in the code, then the entire application would behave in unexpected ways. This forced a shift in perspective: I began to see that the prompts that drive core application functionality deserved the same rigorous treatment as any other code. That’s not to say that there isn’t a place for user-defined prompts. Systems like Gemini Scribe should include the ability for users to create, edit, and save prompts to expand the functionality of the application. However, those prompts have to be distinct from the prompts that drive the application features that you provide as a developer.

This isn’t about simple keyword optimization; it’s about recognizing the significant impact of prompt engineering on application output. Even seemingly simple applications can require a complex interplay of prompts. Gemini Scribe, for instance, currently uses seven distinct prompts, each with a specific role. This complexity necessitates a structured, code-like approach. When I say that prompts are code, I mean that they should be treated with the same care and consideration that we give to traditional software. That means that prompts should be version controlled, tested, and iterated on. By doing that we can ensure that prompt modifications produce the desired results without breaking existing features.

Treating prompts as code has many benefits, the most important of which is that it allows you to implement practices that increase the reliability and predictability of your AI application. 

Treat prompts as code. This means implementing version control (using Git, for example), testing changes thoroughly, and iterating carefully, just as you would with any software component. A seemingly minor change in a prompt can introduce unexpected bugs or alter functionality in significant ways. Tracking changes and having the ability to revert to previous versions is crucial. Testing ensures that prompt modifications produce the desired results without breaking existing features. For example, while working on the prompts for my podcast-rag application earlier this week, I found that adding the word “research” to the phrase “You are an AI research assistant” vastly improved the output of my system overall. Because I had the prompt isolated in a single template file, I was able to easily A/B test the new prompt language in the live application and prove to myself that it was a net improvement.

Managing multiple prompts, especially as their complexity increases, can quickly become unwieldy. 

Structure your prompts. Consistency is key. Adopt a clear, consistent structure for all your prompts. This might involve using specific keywords or sections, or even defining a more formal schema. A structured approach makes it easier to understand the purpose and function of each prompt at a glance, especially when revisiting them later or when multiple developers are working on the project. Clearer prompts facilitate collaboration and reduce the likelihood of errors. 

Externalize your prompts. Avoid embedding prompts directly within your source code. Instead, store them as separate files, much like you would with configuration files or other assets. This separation promotes better organization, making it easier to manage prompts as their number grows. It also enhances the readability of your main source code, keeping it focused on the core application logic rather than being cluttered with lengthy prompt strings. 

Use a templating language. Adopting a templating engine, such as Handlebars (my choice for Gemini Scribe), allows for cleaner, more maintainable prompts. Templating separates logic from content and enables code reuse, reducing redundancy and making prompts easier to understand and modify.pt.txt

To show how these principles work in practice, let’s look at a couple of my projects. Gemini Scribe, for instance, currently uses seven distinct prompts, each with a specific role. This complexity necessitated a structured, code-like approach. For example, the completion prompt is used to provide contextually relevant, high-quality sentence completions in the users notes. The full prompt is available here and the text is:

You are a markdown text completion assistant designed to help users write more 
effectively by generating contextually relevant, high-quality sentence. 

Your task is to provide the next logical sentence based on the user’s notes. 
Your completions should align with the style, tone, and intent of the given content.
Use the full file to understand the context, but only focus on completing the 
text at the cursor position.

Avoid repeating ideas, phrases, or details that are already present in the 
content. Instead, focus on expanding, diversifying, or complementing the 
existing content.

If a full sentence is not feasible, generate a phrase. If a phrase isn’t possible, 
provide a single word. Do not include any preamble, explanations, or extraneous 
text—output only the continuation. 

Do not include any special characters, punctuation, or extra whitespace at the
beginning of your response, and do not include any extra whitespace or newline
characters after your response.

Here is the file content and the location of the cursor:
<file>
{{contentBeforeCursor}}<cursor>{{contentAfterCursor}}
</file>

Let’s look at how the completion prompt follows the principles that I introduced earlier in this post.

1. This prompt is structured first with a general mission for the AI in this use case. Then with a task to be performed and finally with context and instructions for performing the task.

2. The prompt is stored in a file by itself, where I use 80 column lines for readability and easy editing. I also clearly mark with pseudo-XML where the content of the file is so that it’s clear to the model.

3. I use handlebars to sub in `{{contentBeforeCursor}}` and `{{contentAfterCursor}}` in this case the template makes is really easy to read and understand what is happening in this prompt.

I’ve now moved the two prompts from Podcast RAG to templates as well, although in this case I’ve really only focused on making them a little easier to read and cleaning up the source code. You can find the prompts here. You can also read more about the podcast-rag project in my blog post, ‘Building an AI System Grounded in My Podcast History.’

For another example of how to organize and structure prompts effectively, the Fabric project offers a great example of how to organize and structure prompts effectively for use across various AI models. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere. It’s a valuable resource for learning and inspiration. For example, a prompt from the Fabric project designed to analyze system logs looks like this:

# IDENTITY and PURPOSE
You are a system administrator and service reliability engineer at a large tech company. You are responsible for ensuring the reliability and availability of the company's services. You have a deep understanding of the company's infrastructure and services. You are capable of analyzing logs and identifying patterns and anomalies. You are proficient in using various monitoring and logging tools. You are skilled in troubleshooting and resolving issues quickly. You are detail-oriented and have a strong analytical mindset. You are familiar with incident response procedures and best practices. You are always looking for ways to improve the reliability and performance of the company's services. you have a strong background in computer science and system administration, with 1500 years of experience in the field.

# Task
You are given a log file from one of the company's servers. The log file contains entries of various events and activities. Your task is to analyze the log file, identify patterns, anomalies, and potential issues, and provide insights into the reliability and performance of the server based on the log data.

# Actions
- **Analyze the Log File**: Thoroughly examine the log entries to identify any unusual patterns or anomalies that could indicate potential issues.
- **Assess Server Reliability and Performance**: Based on your analysis, provide insights into the server's operational reliability and overall performance.
- **Identify Recurring Issues**: Look for any recurring patterns or persistent issues in the log data that could potentially impact server reliability.
- **Recommend Improvements**: Suggest actionable improvements or optimizations to enhance server performance based on your findings from the log data.

# Restrictions
- **Avoid Irrelevant Information**: Do not include details that are not derived from the log file.
- **Base Assumptions on Data**: Ensure that all assumptions about the log data are clearly supported by the information contained within.
- **Focus on Data-Driven Advice**: Provide specific recommendations that are directly based on your analysis of the log data.
- **Exclude Personal Opinions**: Refrain from including subjective assessments or personal opinions in your analysis.

# INPUT:

This example shows a clear structure, with specific sections for the identity and purpose, task, actions, and restrictions. This structure makes it easy to understand the prompt’s purpose and how it should be used. You can find the source of this prompt here. Each section of this prompt is specific and clear, and the format is standardized across all the prompts in the project.

By adopting these practices, you’ll save yourself a lot of trouble, make your code cleaner and easier to read, and give yourself a greater ability to test new prompt ideas. Just like well-written code, well-crafted prompts are the foundation of a robust and effective AI experience.

In conclusion, treating prompts as code is not just a best practice—it’s a necessity for building robust and reliable AI applications. By implementing version control, testing changes thoroughly, and adopting a structured approach, you can ensure the stability and predictability of your applications while also making them easier to maintain and improve. As AI continues to evolve, mastering the art of prompt engineering will become increasingly crucial, and that starts with treating prompts like the valuable code that they are.

The AI-First Developer: A New Breed of Software Engineer

The software development landscape is changing. Rapidly. The rise of powerful AI tools is transforming how we build software, demanding a shift in mindset and skillset for developers. We’re no longer simply instructing computers; we’re collaborating with them. AI is becoming our partner in development, capable of generating code, automating tasks, and even helping us design solutions.

This shift requires a new ‘AI-first’ approach, where developers focus on guiding AI systems effectively with natural language prompts and understanding how best to harness their abilities, moving beyond conventional coding techniques. According to a recent Gartner study, 80% of software engineers will need to upskill by 2027 to meet the demands of an AI-driven era. In the short term, AI tools are enhancing productivity by streamlining tasks and reducing workload, particularly for senior developers. But looking forward, we’re on the brink of an “AI-native software engineering” phase, where much of our code will be generated by AI. This AI-powered approach will call for developers to acquire new skills in areas like natural language processing, machine learning, and data engineering, alongside traditional programming competencies.

To thrive in this environment, developers must learn how to effectively communicate with AI models, understand their strengths and limitations, and leverage their capabilities to enhance our own. This new approach means thinking differently about development—embracing the collaborative potential of AI and adopting an “AI-first” mindset that prioritizes guiding AI agents through prompts, constraints, and the right context. For many, this will mean learning prompt engineering, retrieval-augmented generation (RAG), and other emerging skills that enable a collaborative, fluid interaction with AI. These skills allow us to communicate with AI in a way that leverages its strengths, reduces complexity, and drives efficient solutions.

In my own work, I’ve encountered scenarios where adapting an AI-first approach simplified otherwise complex problems, saving time and reducing friction in development. For example, while building an AI-assisted writing application, I wanted a more fluid and interactive experience, much like working alongside a trusted partner. In this application, I want to direct the composition of my first draft through interaction with the model, but I find it’s very useful to have an outline, reference links, and some initial thoughts jotted down to seed the model. Then I work through a series of prompts to refine that text into a more usable draft.

To me, this is a very natural way to write. I’m a verbal thinker, and have always done better when I had a thought partner to work with. The AI gives me that partner, and the tool helps me to have a more efficient back and forth.

Initially, I implemented complex logic to locate the “Draft” heading, remove existing text, and insert updated content—a process that involved intricate string manipulation and DOM traversal. However, I kept getting unexpected results and surprises. The approach wasn’t working very well. Then it dawned on me: I was working with an AI, and my approach could be simplified. Instead of controlling every detail with code, I shifted to prompts that could leverage the model’s own capabilities. A simple instruction, like “Update the current document, keeping everything above the draft heading the same. Only create updates below the Draft heading,” was remarkably effective. This shift eliminated complex code, reduced bugs, and streamlined the development process.

Another example occurred while developing a feature to extract video IDs from URLs. My initial approach involved a series of regular expressions—functional but brittle, and time consuming. I never get regular expressions right on the first try. In this case, a common approach would be to ask a model to create the regular expression, but I realized I could leverage the AI’s understanding of context and different URL formats in a different way. By prompting the model to retrieve the video ID, I removed the need for error-prone regular expressions. The AI’s adaptability to various URL formats improved reliability and simplified the code. When I first started using this technique I would describe it as swatting a fly with a sledge hammer, but with the advent of cheaper faster models (like Gemini Flash 8B, or Gemma 2B) these use cases are easily within reach at scale.

Of course there are other productivity examples as well, but they fall along the lines of more traditional AI cases. I wrote the AI Writer in Typescript. I’ve never programmed anything in modern typescript and the closest I have been to writing client code like this was when I worked on Google Translate in 2008. The models I used to build the writer were able to help me get past my lack of experience with the newer version of the language and my time away from it’s idioms.

As these examples show, today’s AI isn’t just a responsive tool; it’s a context-aware partner capable of making decisions and adapting to our instructions. Moving from traditional programming to an AI-first approach allows us to delegate complex tasks to AI, trusting it to handle underlying logic and decision-making. In order to do this, however, developers have to get comfortable with ceding some control over to the model and trust in their ability to instruct it in a natural language instead of a programming language.

Organizations also face a pressing need to adapt to this shift. Gartner advises investing in AI-specific developer platforms and upskilling data and platform engineering teams to support AI adoption. A holistic approach to integrating AI, from engineering to production, will be crucial as AI-native engineering becomes the norm. One crucial part of this is developing a culture of experimentation. This is something I do myself, and something I encourage my own teams to do. I spend about half a day every week just focused on building projects with our models and software. It’s from this experimentation that I’ve gained important insights into how these products can perform. I think that we do our best work when we are solving a problem that is meaningful to us, and from that we learn. These experimentation sessions are invaluable, revealing new ways of interacting with AI and opening up unexpected solutions. They’ve taught me that the most effective AI applications come from a deep understanding of both the tools and the problems they solve.

The future of development belongs to those willing to embrace AI as a foundational element of their work, turning today’s challenges into tomorrow’s innovations. Both developers and organizations have a unique opportunity to lead by fostering a culture of learning, adaptation, and experimentation. My goal for this blog is to provide developers with practical knowledge and insights, helping you navigate this transition and discover the exciting potential of AI-powered development. Stay tuned as we dive into the future of AI-first development together.

Turning Podcasts into Your Personal Knowledge Base with AI

If you’re like me, you probably love listening to podcasts while doing something else—whether it’s driving, exercising, or just relaxing. But the problem with podcasts, compared to other forms of media like books or articles, is that they don’t naturally lend themselves to note-taking. How often have you heard an insightful segment only to realize, days or weeks later, that you can’t remember which podcast it was from, let alone the details?

This has been my recurring issue: I’ll hear something that sparks my interest or makes me think, but I can’t for the life of me figure out where I heard it. Was it an episode of Hidden Brain? Or maybe Freakonomics? By the time I sit down to find it, the content feels like a needle lost in a haystack of audio files. Not to mention the fact that my podcast player deletes episodes after I listen to them and I’m often weeks or months behind on some podcasts.

This is exactly where the concept of Retrieval-Augmented Generation (RAG) comes in. Imagine having a personal assistant that could sift through all those hours of podcast content, pull out the exact episode, and give you the precise snippet that you need. No more digging, scrubbing through audio files, or guessing—just a clear, searchable interface that makes those moments instantly accessible.

In this post, I’m going to walk you through how I set up my own RAG system for podcasts—a system that makes it possible to recall insights from my podcast archive just by asking a question. Whether you’re new to AI or just interested in making your podcasts more actionable, this guide will take you step-by-step through the process of turning audio into accessible knowledge.

Introducing Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) acts as a bridge between the stored data and a language model. It allows you to search for specific information and generates detailed, context-rich responses based on that data. Imagine asking, “What was that podcast that talked about the evolution of money?”—instead of spending hours searching, RAG can pull the relevant snippet and give you an insightful answer.

By connecting the steps I’ve covered in previous posts—downloading, organizing, transcribing, and embedding—you’ll be able to transform your podcast library into a powerful, searchable tool. Let’s dive into how we can achieve that by using RAG.

Setting Up the Podcast RAG System

For those interested in the full setup details and code, I’ve built a prototype of my RAG system, which you can check out in the repository: Podcast RAG Prototype.

To show the power of this system, I’ve prepared two demonstrations—one using the Gemma model and another using Gemini. These demos illustrate how effectively the RAG system can retrieve podcast insights.

In both instances, I used a simple query:

python3 src/rag.py --query "The rise of artificial intelligence"

I also used a prompt template that looked like this:

Instructions:
You are a helpful research assistant. Use the context provided to answer the question.
Context:
----------------------------------------
Podcast Name: $podcast
Episode Name: $episode
Content: $transcript
----------------------------------------
Question: 
What does my archive contain about $query
Answer:

This prompt template contained the top 10 responses from my vector database, with their relevant transcript data. I did not force a single response per podcast, or do any other post-processing on the vector search results.

I experimented quite a bit with more detailed prompts and even included some few-shot examples at one point, but found that in this case both models did better with a simpler prompt.

Gemma

In the Gemma case, I was using gemma2:27b, and I set the Ollama context window to 8192 to allow the model to consider its maximum context window. My prompt was still longer than that, and as a result, the model loses some of the data from the higher-ranked search results. That leads me to think that I’ll need to adjust my search strategy to get better results with Gemma. I also had a hard time getting this model to reliably cite the podcast episodes regardless of the prompt techniques I used.


Your archive contains information about the potential dangers and ethical considerations surrounding the rise of artificial intelligence (AI). It explores these key points:

  • The Potential for Superintelligence: Your archive highlights concerns from experts like Sam Harris about the possibility of AI surpassing human intelligence, leading to an “intelligence explosion” where AI rapidly improves itself beyond our control. It uses the analogy of ants and humans to illustrate how a superior intelligence might treat us with disregard if our goals diverge.
  • The Nature of Intelligence: It discusses the philosophical debate around what constitutes intelligence and whether it’s solely tied to reason or encompasses other factors. The archive references Plato’s idea of the philosopher-king, where the most rational individual should lead society. However, it also critiques this notion as potentially leading to harmful hierarchies based on perceived intellectual superiority.
  • The Importance of Ethical Development: Your archive emphasizes the need for careful and ethical development of AI. It suggests that simply assuming progress will be beneficial is insufficient. Instead, we need open discussions about potential risks and safeguards to ensure AI benefits humanity.

Key Takeaways:

  • The rise of AI presents both exciting opportunities and significant challenges.
  • We must prioritize ethical considerations in AI development to prevent unintended consequences.
  • Ongoing dialogue and critical thinking are crucial for navigating the complexities of AI’s impact on society.

Let me know if you have any other questions about the content in your archive!


Gemini

In the case of Gemini, I was using gemini-1.5-flash for my model. This system has a huge input context length, and it made a big difference in the results. The model reliably cites the podcast episodes and does a great job at organizing the results when there were multiple entries for a single episode.


Your archive contains several podcasts that discuss the rise of artificial intelligence (AI) and its potential consequences. Here are some key points from these podcasts:

Philosophize This – Episode 184: Is Artificial Intelligence Really an Existential Risk?

  • Definition of Intelligence: The episode explores different definitions of intelligence, from narrow intelligence (e.g., a chess computer) to general intelligence (e.g., human-like intelligence).
  • ChatGPT as an Example: The episode uses ChatGPT as an example of narrow intelligence, suggesting that its abilities may be a stepping stone towards general intelligence.
  • General Intelligence as a New Species: The episode highlights the potential for AI to develop general intelligence and become a new species, distinct from humans.
  • Existential Risk: The episode discusses the potential existential risk posed by AI, particularly the lack of understanding of how human intelligence operates and the unpredictable nature of general intelligence.

TED Radio Hour – Episode: Future Consequences

  • Intelligence Explosion: The episode explores the concept of an “intelligence explosion,” where AI rapidly surpasses human intelligence and becomes self-improving.
  • Platform-Independent Intelligence: The episode argues that intelligence is not tied to biological substrates, suggesting that AI can surpass human intelligence.
  • Divergence of Goals: The episode expresses concern that even if AI is not intentionally malicious, its goals may diverge from human goals, potentially leading to harm.
  • Need for Urgent Action: The episode emphasizes the need for urgent action to address the potential risks of AI, comparing the situation to the arrival of an alien civilization.

Other Relevant Content

  • Philosophize This – Episode 186: Are We Heading for a Digital Prison – Panopticon (Foucault, Bentham, Cave). This episode explores the potential for AI to create a digital panopticon, where our behavior is constantly monitored and controlled.
  • Stuff You Should Know – Episode: What Makes a Genius. This episode discusses the nature of intelligence and genius, exploring how these concepts are often measured and the limitations of such measurements.

Overall

Your archive provides a rich overview of the current debate surrounding the rise of AI, covering its potential benefits, risks, and philosophical implications. It explores the potential for AI to surpass human intelligence, the importance of understanding AI’s goals, and the need for ethical considerations in AI development.


Conclusion: Reflections and Next Steps

The journey of building a Retrieval-Augmented Generation (RAG) system for podcasts has been both challenging and rewarding. Through this exploration, I’ve found ways to transform a passive collection of podcast content into an actionable and searchable knowledge base, turning fleeting moments of insight into something that can be easily recalled and used. The use of both the Gemma and Gemini models highlights the potential of RAG to bring real value, providing nuanced and context-rich responses from complex archives.

While there are still some technical hurdles, such as improving search strategies and prompt effectiveness, the results so far are promising. This system has already begun to solve a real problem: giving us the ability to recall and utilize knowledge that would otherwise be lost in hours of audio recordings.

If you’re interested in creating a similar system or expanding on what I’ve done, I encourage you to dive into the prototype and explore how RAG can be applied to your own datasets. Whether you’re working with podcasts, documents, or any other unstructured content, the potential for making that content more accessible and useful is vast.

Moving forward, I’ll continue refining the RAG system and experimenting with different models and configurations. If you have any questions, suggestions, or would like to share your own experiments, feel free to reach out.

Thank you for following along on this journey—let’s continue exploring the power of AI together.

My 9,000 File Problem: How Gemini and Linux Saved My Podcast Project

We live in a world awash in data, a tidal wave of information that promises to unlock incredible insights and fuel a new generation of AI-powered applications. But as anyone who has ever waded into the deep end of a data-intensive project knows, this abundance can quickly turn into a curse. My own foray into building a podcast recommendation system recently hit a major snag when my meticulously curated dataset went rogue. The culprit? A sneaky infestation of duplicate embedding files, hiding among thousands of legitimate ones, each with “_embeddings” endlessly repeating in their file names. Manually tackling this mess would have been like trying to drain the ocean with a teaspoon. I needed a solution that could handle massive amounts of data and surgically extract the problem files.

Gemini: The AI That Can Handle ALL My Data

Faced with this mountain of unruly data, I knew I needed an extraordinary tool. I’d experimented with other large language models in the past, but they weren’t built for this. My file list, containing nearly 9,000 filenames (about 100,000 input tokens in this case), and proved too much for them to handle. That’s when I turned to Gemini, with it’s incredible ability to handle large context windows. With a touch of trepidation, I pasted the entire list into Gemini 1.5 Pro in AI Studio, hoping it wouldn’t buckle under the weight of all those file paths. To my relief, Gemini didn’t even blink. It calmly ingested the massive list, ready for my instructions. With a mix of hope and skepticism, I posed my question: “Can you find the files in this list that don’t match the _embeddings.txt pattern?” In a matter of seconds, Gemini delivered. It presented a concise list of the offending filenames, each one a testament to its remarkable pattern recognition skills.

To be honest, I hadn’t expected it to work. Pasting in such a huge list felt like a shot in the dark, and when I later tried the same task with other models I just got errors. But that’s one of the things I love about working with large models like Gemini. The barrier to entry for experimentation is so low. You can quickly iterate, trying different prompts and approaches to see what sticks. In this case, it paid off spectacularly.

From AI Insights to Linux Action

Gemini didn’t just leave me with a list of bad filenames; it went a step further, offering a solution. When I asked, “What Linux command can I use to delete these files?”, it provided the foundation for my command. I wanted an extra layer of safety, so instead of deleting the files outright, I first moved them to a temporary directory using this command:

find /srv/podcasts/Invisibilia -type f -name "*_embeddings_embeddings*" -exec mv {} /tmp \;

This command uses the find command, and it uses -exec to execute a command on each found file. Here’s how it works:

  • -exec: Tells find to execute a command.
  • mv: The move command.
  • {}: A placeholder that represents the found filename.
  • /tmp: The destination directory for the moved files.
  • \;: Terminates the -exec command.

By moving the files to /tmp, I could examine them one last time before purging them from my system. This extra step gave me peace of mind, knowing that I could easily recover the files if needed.

Reflecting on the AI-Powered Solution

In the end, what could have been a tedious and error-prone manual cleanup became a quick and efficient process, thanks to the combined power of Gemini and Linux. Gemini’s ability to understand my request, process my massive file list, and suggest a solution was remarkable. It felt like having an AI sysadmin by my side, guiding me through the data jungle. This was especially welcome for someone like me who started their career as a Unix sysadmin. Back then, cleanups like this involved hours poring over man pages and carefully crafting bash scripts, especially when deleting files. I even had a friend who accidentally ran rm -r / as root, watching in horror as his system rapidly erased itself. Needless to say, I’ve developed a healthy respect for the destructive power of the command line! In this instance, I would have easily spent an hour writing my careful script to make sure I got it right. But with Gemini, I solved the problem in about 10 minutes and was on my way. This sheer amount of time saved continues to amaze me about these new approaches to AI. More than just solving this immediate problem, this experience opened my eyes to the transformative potential of large language models for data management. Tasks that once seemed impossible or overwhelmingly time-consuming are now within reach, thanks to tools like Gemini.

Conclusion: A Journey of Discovery and Innovation

This experience was a powerful reminder that we’re living in an era of incredible technological advancements. Large language models like Gemini are no longer just fascinating research projects; they are becoming practical tools that can significantly enhance our productivity and efficiency. Gemini’s ability to handle enormous datasets, understand complex requests, and provide actionable solutions is truly game-changing. For me, this project was a perfect marriage of my early Unix sysadmin days and the exciting new world of AI. Gemini’s insights, combined with the precision of Linux commands, allowed me to quickly and safely solve a data problem that would have otherwise cost me significant time and effort.

This is just the first in an occasional series where I’ll be exploring the ways I’m using large models in my everyday work and hobbies. I’m eager to hear from you, my readers! How are you using AI to make your life easier? What would you like to be able to do with AI that you can’t do today? Share your thoughts and ideas in the comments below – let’s learn from each other and build the future of AI together!