There was a moment, while working on Gemini Scribe, when I realized something: my prompts weren’t just configurations, they were the core logic of my AI application. That’s when I understood that prompts are, fundamentally, code. The future of AI development isn’t just about sophisticated models; it’s about mastering the art of prompt engineering. And that starts with understanding one crucial fact: prompts are code. Today’s AI applications are evolving beyond simple interactions, often relying on a complex web of prompts working together. As these applications grow, the interdependencies between prompts can become difficult to manage, leading to unexpected behavior and frustrating debugging sessions. In this post, I’ll explore why you should treat your prompts as code, how to structure them effectively, and share examples from my own projects like Gemini Scribe and Podcast RAG.
AI prompts are more than just instructions; they’re the code that drives the behavior of your applications. My recent work on the Gemini Scribe Obsidian plugin highlighted this. I initially treated prompts as user configuration, easily tweaked in the settings. I thought that by doing this I would be giving users the most flexibility to use the application in the way that best met their needs. However, as Gemini Scribe grew, key features became unexpectedly dependent on specific prompt phrasing. I realized that if someone changed the prompt without thinking through the dependencies in the code, then the entire application would behave in unexpected ways. This forced a shift in perspective: I began to see that the prompts that drive core application functionality deserved the same rigorous treatment as any other code. That’s not to say that there isn’t a place for user-defined prompts. Systems like Gemini Scribe should include the ability for users to create, edit, and save prompts to expand the functionality of the application. However, those prompts have to be distinct from the prompts that drive the application features that you provide as a developer.
This isn’t about simple keyword optimization; it’s about recognizing the significant impact of prompt engineering on application output. Even seemingly simple applications can require a complex interplay of prompts. Gemini Scribe, for instance, currently uses seven distinct prompts, each with a specific role. This complexity necessitates a structured, code-like approach. When I say that prompts are code, I mean that they should be treated with the same care and consideration that we give to traditional software. That means that prompts should be version controlled, tested, and iterated on. By doing that we can ensure that prompt modifications produce the desired results without breaking existing features.
Treating prompts as code has many benefits, the most important of which is that it allows you to implement practices that increase the reliability and predictability of your AI application.
Treat prompts as code. This means implementing version control (using Git, for example), testing changes thoroughly, and iterating carefully, just as you would with any software component. A seemingly minor change in a prompt can introduce unexpected bugs or alter functionality in significant ways. Tracking changes and having the ability to revert to previous versions is crucial. Testing ensures that prompt modifications produce the desired results without breaking existing features. For example, while working on the prompts for my podcast-rag application earlier this week, I found that adding the word “research” to the phrase “You are an AI research assistant” vastly improved the output of my system overall. Because I had the prompt isolated in a single template file, I was able to easily A/B test the new prompt language in the live application and prove to myself that it was a net improvement.
Managing multiple prompts, especially as their complexity increases, can quickly become unwieldy.
Structure your prompts. Consistency is key. Adopt a clear, consistent structure for all your prompts. This might involve using specific keywords or sections, or even defining a more formal schema. A structured approach makes it easier to understand the purpose and function of each prompt at a glance, especially when revisiting them later or when multiple developers are working on the project. Clearer prompts facilitate collaboration and reduce the likelihood of errors.
Externalize your prompts. Avoid embedding prompts directly within your source code. Instead, store them as separate files, much like you would with configuration files or other assets. This separation promotes better organization, making it easier to manage prompts as their number grows. It also enhances the readability of your main source code, keeping it focused on the core application logic rather than being cluttered with lengthy prompt strings.
Use a templating language. Adopting a templating engine, such as Handlebars (my choice for Gemini Scribe), allows for cleaner, more maintainable prompts. Templating separates logic from content and enables code reuse, reducing redundancy and making prompts easier to understand and modify.pt.txt
To show how these principles work in practice, let’s look at a couple of my projects. Gemini Scribe, for instance, currently uses seven distinct prompts, each with a specific role. This complexity necessitated a structured, code-like approach. For example, the completion prompt is used to provide contextually relevant, high-quality sentence completions in the users notes. The full prompt is available here and the text is:
You are a markdown text completion assistant designed to help users write more
effectively by generating contextually relevant, high-quality sentence.
Your task is to provide the next logical sentence based on the user’s notes.
Your completions should align with the style, tone, and intent of the given content.
Use the full file to understand the context, but only focus on completing the
text at the cursor position.
Avoid repeating ideas, phrases, or details that are already present in the
content. Instead, focus on expanding, diversifying, or complementing the
existing content.
If a full sentence is not feasible, generate a phrase. If a phrase isn’t possible,
provide a single word. Do not include any preamble, explanations, or extraneous
text—output only the continuation.
Do not include any special characters, punctuation, or extra whitespace at the
beginning of your response, and do not include any extra whitespace or newline
characters after your response.
Here is the file content and the location of the cursor:
<file>
{{contentBeforeCursor}}<cursor>{{contentAfterCursor}}
</file>
Let’s look at how the completion prompt follows the principles that I introduced earlier in this post.
1. This prompt is structured first with a general mission for the AI in this use case. Then with a task to be performed and finally with context and instructions for performing the task.
2. The prompt is stored in a file by itself, where I use 80 column lines for readability and easy editing. I also clearly mark with pseudo-XML where the content of the file is so that it’s clear to the model.
3. I use handlebars to sub in `{{contentBeforeCursor}}` and `{{contentAfterCursor}}` in this case the template makes is really easy to read and understand what is happening in this prompt.
I’ve now moved the two prompts from Podcast RAG to templates as well, although in this case I’ve really only focused on making them a little easier to read and cleaning up the source code. You can find the prompts here. You can also read more about the podcast-rag project in my blog post, ‘Building an AI System Grounded in My Podcast History.’
For another example of how to organize and structure prompts effectively, the Fabric project offers a great example of how to organize and structure prompts effectively for use across various AI models. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere. It’s a valuable resource for learning and inspiration. For example, a prompt from the Fabric project designed to analyze system logs looks like this:
# IDENTITY and PURPOSE
You are a system administrator and service reliability engineer at a large tech company. You are responsible for ensuring the reliability and availability of the company's services. You have a deep understanding of the company's infrastructure and services. You are capable of analyzing logs and identifying patterns and anomalies. You are proficient in using various monitoring and logging tools. You are skilled in troubleshooting and resolving issues quickly. You are detail-oriented and have a strong analytical mindset. You are familiar with incident response procedures and best practices. You are always looking for ways to improve the reliability and performance of the company's services. you have a strong background in computer science and system administration, with 1500 years of experience in the field.
# Task
You are given a log file from one of the company's servers. The log file contains entries of various events and activities. Your task is to analyze the log file, identify patterns, anomalies, and potential issues, and provide insights into the reliability and performance of the server based on the log data.
# Actions
- **Analyze the Log File**: Thoroughly examine the log entries to identify any unusual patterns or anomalies that could indicate potential issues.
- **Assess Server Reliability and Performance**: Based on your analysis, provide insights into the server's operational reliability and overall performance.
- **Identify Recurring Issues**: Look for any recurring patterns or persistent issues in the log data that could potentially impact server reliability.
- **Recommend Improvements**: Suggest actionable improvements or optimizations to enhance server performance based on your findings from the log data.
# Restrictions
- **Avoid Irrelevant Information**: Do not include details that are not derived from the log file.
- **Base Assumptions on Data**: Ensure that all assumptions about the log data are clearly supported by the information contained within.
- **Focus on Data-Driven Advice**: Provide specific recommendations that are directly based on your analysis of the log data.
- **Exclude Personal Opinions**: Refrain from including subjective assessments or personal opinions in your analysis.
# INPUT:
This example shows a clear structure, with specific sections for the identity and purpose, task, actions, and restrictions. This structure makes it easy to understand the prompt’s purpose and how it should be used. You can find the source of this prompt here. Each section of this prompt is specific and clear, and the format is standardized across all the prompts in the project.
By adopting these practices, you’ll save yourself a lot of trouble, make your code cleaner and easier to read, and give yourself a greater ability to test new prompt ideas. Just like well-written code, well-crafted prompts are the foundation of a robust and effective AI experience.
In conclusion, treating prompts as code is not just a best practice—it’s a necessity for building robust and reliable AI applications. By implementing version control, testing changes thoroughly, and adopting a structured approach, you can ensure the stability and predictability of your applications while also making them easier to maintain and improve. As AI continues to evolve, mastering the art of prompt engineering will become increasingly crucial, and that starts with treating prompts like the valuable code that they are.