A futuristic clockwork mechanism with glowing nodes, representing community collaboration, automated tasks, and precise measurement.

Automation and Measurement: Inside Gemini Scribe 4.8.0

May 9, 2026 Allen HutchisonLeave a comment

I recently wrapped up the development cycle for Gemini Scribe 4.8.0. Looking back at the ~99 pull requests merged over the last month, the sheer volume of changes is significant. Not only are we shipping major features, but I’m also seeing a steady uptick in contributions from collaborators, an increase in issues filed by the community, and much more activity in our discussion group. Beyond the changelog and community growth, two structural narratives define this release: automation and measurement.

As I discussed in the evolution of Gemini Scribe, the goal has always been to move beyond a simple chat interface. With 4.8.0, we are taking a massive step toward making the agent a true background worker in your vault.

Here is a look at the architecture, the code, and what this release means for the future of our agentic workflows.

The Push for Automation

For a long time, running a complex agent task meant staring at a blocking UI. If you asked the agent to perform deep research or generate an image, you waited.

To solve this, we introduced a unified background execution lane. The new BackgroundTaskManager allows tools like DeepResearchTool and GenerateImageTool to accept a background: true parameter. The agent submits the task, receives an ID immediately, and returns to its turn. You can monitor these tasks in the new Gemini Activity modal, which consolidates background tasks and RAG indexing status into one view.

But unblocking the UI was only half the battle. We wanted to lay the groundwork for an agent that operates in the background. While true autonomy is a spectrum, the first step is moving away from the chat box and into scheduled, asynchronous workflows.

The Scheduled Task Engine

The marquee feature of 4.8.0 is the full task scheduling system. You can now define a task as a markdown file, and the plugin will run it on a cadence as a headless agent session, writing the output back to the vault.

To make this work, we built a ScheduledTaskManager with a 60-second tick loop. Tasks are stored in [state-folder]/Scheduled-Tasks/ with a sidecar JSON file for state. The headless ScheduledTaskRunner mirrors the standard AgentViewTools but auto-approves all tool calls.

We also expanded the schedule grammar. Originally, daily meant “every 24 hours from creation,” which surprised users. Now, you can specify daily@HH:MM and weekly@HH:MM:DAYS, so you can finally tell the agent to run “every weekday at 4:30 PM.”

We also handle missed runs gracefully. On startup, any task with runIfMissed: true that missed its window surfaces in a CatchUpModal.

Right now, this is essentially a highly intelligent cron job. You are still explicitly telling the agent when to run. But this scheduling engine is the foundational infrastructure for what comes next. In the next release, we are introducing Obsidian lifecycle hooks. Instead of just running on a timer, the agent will be able to react to events, triggering workflows when you create a new file, save a note, or modify a project board. That is where we cross the threshold into true ambient AI.

How I Use This in Practice

To give you an idea of what this unlocks, I currently rely on a few specific scheduled workflows:

The Daily Setup: Every afternoon, a scheduled skill runs to prepare my vault for the following day. It looks up my calendar, creates my daily note if it doesn’t exist, and seeds it with my upcoming meetings. It goes a step further by creating individual meeting note entries and building out context notes for the people I’ll be meeting with. When I walk into the office the next morning, my daily note is already prepped and ready to go.

Automated Blog Drafts: I also use this to automate my content pipeline. I have a scheduled skill that monitors my Readwise syncs and automatically generates drafts for my “Reading List” blog posts. Instead of manually curating and formatting these, the agent handles the heavy lifting in the background, leaving me to just review and polish the draft.

If you are worried about the agent running amok in your vault while you aren’t looking, there are several ways to mitigate this. You can limit the tools the agent has access to. If you don’t want it overwriting files, you can simply restrict its write access. Additionally, the agent’s response from any scheduled task is always saved in the Scheduled-Tasks/Runs file, giving you a complete audit log of what the agent had to say during the session.

In my case, I’m automating skills that I’ve been running manually for a while now, and I run my agent in a mode where I let it write and edit files day-to-day. You should set up your tasks to match your own comfort level. You can read more about how to configure this in the Scheduled Tasks Documentation.

Extracting the Agent Loop

To support headless scheduled tasks, I had to refactor how the agent executes tools. Previously, the tool-execution loop was tightly coupled to the UI in AgentViewTools.

I extracted this logic into a UI-agnostic AgentLoop class. AgentViewTools shrank from 386 lines down to 187, becoming a thin adapter over AgentLoop with specific hooks (onToolBatchStart, onToolCallStart, etc.).

// Conceptual extraction of the AgentLoop
export class AgentLoop {
  constructor(private engine: ToolExecutionEngine) {}
  
  async execute(turn: AgentTurn) {
    // Iterative tool execution, removing the recursive stack-depth ceiling
    while (this.hasPendingToolCalls(turn)) {
       // Loop detection, batching, and execution logic lives here
    }
  }
}

This extraction immediately paid dividends, catching bugs that a duplicate headless runner had introduced, and eliminating a recursive stack-depth ceiling on deep tool chains. More importantly, it means scheduled tasks, evals, and the UI all share the exact same execution engine.

Local Models with Ollama and Gemma 4

First-class local-model support is here. By leveraging the ModelApi seam, chat, summarization, rewrite, and agent tool-calling all work against a local Ollama server. You can use any model from Ollama that supports tool calling, though I have personally only tested this extensively with Gemma 4.

In my local evaluation harness, Gemma 4 performed exceptionally well. It is incredibly capable, fast, and handles the agent loop with a level of reliability that makes local-only agentic workflows genuinely viable.

The way I use this right now is as an offline fallback: when I don’t have an internet connection, I switch to Gemma 4 and just keep working. Obviously, running offline means I don’t have access to online-dependent tools like Google Search, Deep Research, or Image Generation. But for synthesizing notes, organizing projects, or drafting content securely, it is incredibly powerful.

In the future, we will be refining the system to allow you to pick the model you want on a per-function basis. This means you’ll be able to route sensitive, local text processing to an offline model while still leveraging cloud models for heavy-lifting tasks like Deep Research or Image Generation when you are connected.

Moving from Guessing to Measuring

As the agent loop gets more complex (handling runaway loop aborts and budget constraints) we can no longer rely on “vibes” to know if a change improved the system.

To solve this, I built a new CLI-driven eval harness (npm run eval) that drives a live Obsidian instance. It captures turns, tool calls, token usage, cache ratios, and cost. Crucially, it measures reliability. By passing --repeat=N, the harness repeats each task to surface flakiness, reporting a pass^k metric. We can now test multi-hop retrieval and loop-trap cyclic references programmatically, ensuring the agent bails cleanly instead of spinning forever.

Right now, the focus for 4.8.0 was getting this infrastructure in place and establishing the beginnings of our eval set. Having the harness is the first step; the next step is building out a robust suite of test cases that reflect real-world vault interactions.

I would love to see contributions from the community for the evals themselves! If you have complex agentic workflows or edge cases you want to ensure remain stable, please submit them. In the next release, we will start publishing the actual eval results and benchmarks directly in the repo so we can transparently track the agent’s performance over time.

What’s Next?

What does this implementation tell us about the future of software engineering and personal knowledge management?

We are seeing a clear shift toward ambient AI. The chat interface is a great starting point, but the true value of an agentic system is its ability to operate asynchronously. While the scheduling engine in 4.8.0 acts as a highly capable cron job, it lays the groundwork for the event-driven lifecycle hooks coming in the next release.

By combining the AgentLoop extraction with asynchronous execution, Gemini Scribe is no longer just a tool you use; it is becoming a system that reacts and works alongside you. When you can rely on a background orchestrator to run your housekeeping routines (like updating changelogs or triaging issues) while you eat dinner, the vault becomes a living, breathing entity. The agent becomes a true extension of your workflow, utilizing the built-in skills we’ve developed entirely in the background.

Gemini Scribe 4.8.0 is a massive architectural leap forward. The code is cleaner, the tests are faster (thanks to a Vitest migration), and the agent is more autonomous than ever.

If you want to dive into the specifics or try out the new scheduling grammar, check out the updated documentation on scheduled tasks.

Let me know what automated tasks you end up building. I’m already finding new ways to let the agent do the heavy lifting while I focus on the work that matters.

The AI-First Developer: A New Breed of Software Engineer

November 10, 2024 Allen Hutchison1 Comment

The software development landscape is changing. Rapidly. The rise of powerful AI tools is transforming how we build software, demanding a shift in mindset and skillset for developers. We’re no longer simply instructing computers; we’re collaborating with them. AI is becoming our partner in development, capable of generating code, automating tasks, and even helping us design solutions.

This shift requires a new ‘AI-first’ approach, where developers focus on guiding AI systems effectively with natural language prompts and understanding how best to harness their abilities, moving beyond conventional coding techniques. According to a recent Gartner study, 80% of software engineers will need to upskill by 2027 to meet the demands of an AI-driven era. In the short term, AI tools are enhancing productivity by streamlining tasks and reducing workload, particularly for senior developers. But looking forward, we’re on the brink of an “AI-native software engineering” phase, where much of our code will be generated by AI. This AI-powered approach will call for developers to acquire new skills in areas like natural language processing, machine learning, and data engineering, alongside traditional programming competencies.

To thrive in this environment, developers must learn how to effectively communicate with AI models, understand their strengths and limitations, and leverage their capabilities to enhance our own. This new approach means thinking differently about development—embracing the collaborative potential of AI and adopting an “AI-first” mindset that prioritizes guiding AI agents through prompts, constraints, and the right context. For many, this will mean learning prompt engineering, retrieval-augmented generation (RAG), and other emerging skills that enable a collaborative, fluid interaction with AI. These skills allow us to communicate with AI in a way that leverages its strengths, reduces complexity, and drives efficient solutions.

In my own work, I’ve encountered scenarios where adapting an AI-first approach simplified otherwise complex problems, saving time and reducing friction in development. For example, while building an AI-assisted writing application, I wanted a more fluid and interactive experience, much like working alongside a trusted partner. In this application, I want to direct the composition of my first draft through interaction with the model, but I find it’s very useful to have an outline, reference links, and some initial thoughts jotted down to seed the model. Then I work through a series of prompts to refine that text into a more usable draft.

To me, this is a very natural way to write. I’m a verbal thinker, and have always done better when I had a thought partner to work with. The AI gives me that partner, and the tool helps me to have a more efficient back and forth.

Initially, I implemented complex logic to locate the “Draft” heading, remove existing text, and insert updated content—a process that involved intricate string manipulation and DOM traversal. However, I kept getting unexpected results and surprises. The approach wasn’t working very well. Then it dawned on me: I was working with an AI, and my approach could be simplified. Instead of controlling every detail with code, I shifted to prompts that could leverage the model’s own capabilities. A simple instruction, like “Update the current document, keeping everything above the draft heading the same. Only create updates below the Draft heading,” was remarkably effective. This shift eliminated complex code, reduced bugs, and streamlined the development process.

Another example occurred while developing a feature to extract video IDs from URLs. My initial approach involved a series of regular expressions—functional but brittle, and time consuming. I never get regular expressions right on the first try. In this case, a common approach would be to ask a model to create the regular expression, but I realized I could leverage the AI’s understanding of context and different URL formats in a different way. By prompting the model to retrieve the video ID, I removed the need for error-prone regular expressions. The AI’s adaptability to various URL formats improved reliability and simplified the code. When I first started using this technique I would describe it as swatting a fly with a sledge hammer, but with the advent of cheaper faster models (like Gemini Flash 8B, or Gemma 2B) these use cases are easily within reach at scale.

Of course there are other productivity examples as well, but they fall along the lines of more traditional AI cases. I wrote the AI Writer in Typescript. I’ve never programmed anything in modern typescript and the closest I have been to writing client code like this was when I worked on Google Translate in 2008. The models I used to build the writer were able to help me get past my lack of experience with the newer version of the language and my time away from it’s idioms.

As these examples show, today’s AI isn’t just a responsive tool; it’s a context-aware partner capable of making decisions and adapting to our instructions. Moving from traditional programming to an AI-first approach allows us to delegate complex tasks to AI, trusting it to handle underlying logic and decision-making. In order to do this, however, developers have to get comfortable with ceding some control over to the model and trust in their ability to instruct it in a natural language instead of a programming language.

Organizations also face a pressing need to adapt to this shift. Gartner advises investing in AI-specific developer platforms and upskilling data and platform engineering teams to support AI adoption. A holistic approach to integrating AI, from engineering to production, will be crucial as AI-native engineering becomes the norm. One crucial part of this is developing a culture of experimentation. This is something I do myself, and something I encourage my own teams to do. I spend about half a day every week just focused on building projects with our models and software. It’s from this experimentation that I’ve gained important insights into how these products can perform. I think that we do our best work when we are solving a problem that is meaningful to us, and from that we learn. These experimentation sessions are invaluable, revealing new ways of interacting with AI and opening up unexpected solutions. They’ve taught me that the most effective AI applications come from a deep understanding of both the tools and the problems they solve.

The future of development belongs to those willing to embrace AI as a foundational element of their work, turning today’s challenges into tomorrow’s innovations. Both developers and organizations have a unique opportunity to lead by fostering a culture of learning, adaptation, and experimentation. My goal for this blog is to provide developers with practical knowledge and insights, helping you navigate this transition and discover the exciting potential of AI-powered development. Stay tuned as we dive into the future of AI-first development together.

Turning Podcasts into Your Personal Knowledge Base with AI

October 27, 2024 Allen Hutchison1 Comment

If you’re like me, you probably love listening to podcasts while doing something else—whether it’s driving, exercising, or just relaxing. But the problem with podcasts, compared to other forms of media like books or articles, is that they don’t naturally lend themselves to note-taking. How often have you heard an insightful segment only to realize, days or weeks later, that you can’t remember which podcast it was from, let alone the details?

This has been my recurring issue: I’ll hear something that sparks my interest or makes me think, but I can’t for the life of me figure out where I heard it. Was it an episode of Hidden Brain? Or maybe Freakonomics? By the time I sit down to find it, the content feels like a needle lost in a haystack of audio files. Not to mention the fact that my podcast player deletes episodes after I listen to them and I’m often weeks or months behind on some podcasts.

This is exactly where the concept of Retrieval-Augmented Generation (RAG) comes in. Imagine having a personal assistant that could sift through all those hours of podcast content, pull out the exact episode, and give you the precise snippet that you need. No more digging, scrubbing through audio files, or guessing—just a clear, searchable interface that makes those moments instantly accessible.

In this post, I’m going to walk you through how I set up my own RAG system for podcasts—a system that makes it possible to recall insights from my podcast archive just by asking a question. Whether you’re new to AI or just interested in making your podcasts more actionable, this guide will take you step-by-step through the process of turning audio into accessible knowledge.

Introducing Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) acts as a bridge between the stored data and a language model. It allows you to search for specific information and generates detailed, context-rich responses based on that data. Imagine asking, “What was that podcast that talked about the evolution of money?”—instead of spending hours searching, RAG can pull the relevant snippet and give you an insightful answer.

By connecting the steps I’ve covered in previous posts—downloading, organizing, transcribing, and embedding—you’ll be able to transform your podcast library into a powerful, searchable tool. Let’s dive into how we can achieve that by using RAG.

Setting Up the Podcast RAG System

For those interested in the full setup details and code, I’ve built a prototype of my RAG system, which you can check out in the repository: Podcast RAG Prototype.

To show the power of this system, I’ve prepared two demonstrations—one using the Gemma model and another using Gemini. These demos illustrate how effectively the RAG system can retrieve podcast insights.

In both instances, I used a simple query:

python3 src/rag.py --query "The rise of artificial intelligence"

I also used a prompt template that looked like this:

Instructions:
You are a helpful research assistant. Use the context provided to answer the question.
Context:
----------------------------------------
Podcast Name: $podcast
Episode Name: $episode
Content: $transcript
----------------------------------------
Question: 
What does my archive contain about $query
Answer:

This prompt template contained the top 10 responses from my vector database, with their relevant transcript data. I did not force a single response per podcast, or do any other post-processing on the vector search results.

I experimented quite a bit with more detailed prompts and even included some few-shot examples at one point, but found that in this case both models did better with a simpler prompt.

Gemma

In the Gemma case, I was using gemma2:27b, and I set the Ollama context window to 8192 to allow the model to consider its maximum context window. My prompt was still longer than that, and as a result, the model loses some of the data from the higher-ranked search results. That leads me to think that I’ll need to adjust my search strategy to get better results with Gemma. I also had a hard time getting this model to reliably cite the podcast episodes regardless of the prompt techniques I used.

Your archive contains information about the potential dangers and ethical considerations surrounding the rise of artificial intelligence (AI). It explores these key points:

The Potential for Superintelligence: Your archive highlights concerns from experts like Sam Harris about the possibility of AI surpassing human intelligence, leading to an “intelligence explosion” where AI rapidly improves itself beyond our control. It uses the analogy of ants and humans to illustrate how a superior intelligence might treat us with disregard if our goals diverge.
The Nature of Intelligence: It discusses the philosophical debate around what constitutes intelligence and whether it’s solely tied to reason or encompasses other factors. The archive references Plato’s idea of the philosopher-king, where the most rational individual should lead society. However, it also critiques this notion as potentially leading to harmful hierarchies based on perceived intellectual superiority.
The Importance of Ethical Development: Your archive emphasizes the need for careful and ethical development of AI. It suggests that simply assuming progress will be beneficial is insufficient. Instead, we need open discussions about potential risks and safeguards to ensure AI benefits humanity.

Key Takeaways:

The rise of AI presents both exciting opportunities and significant challenges.
We must prioritize ethical considerations in AI development to prevent unintended consequences.
Ongoing dialogue and critical thinking are crucial for navigating the complexities of AI’s impact on society.

Let me know if you have any other questions about the content in your archive!

Gemini

In the case of Gemini, I was using gemini-1.5-flash for my model. This system has a huge input context length, and it made a big difference in the results. The model reliably cites the podcast episodes and does a great job at organizing the results when there were multiple entries for a single episode.

Your archive contains several podcasts that discuss the rise of artificial intelligence (AI) and its potential consequences. Here are some key points from these podcasts:

Philosophize This – Episode 184: Is Artificial Intelligence Really an Existential Risk?

Definition of Intelligence: The episode explores different definitions of intelligence, from narrow intelligence (e.g., a chess computer) to general intelligence (e.g., human-like intelligence).
ChatGPT as an Example: The episode uses ChatGPT as an example of narrow intelligence, suggesting that its abilities may be a stepping stone towards general intelligence.
General Intelligence as a New Species: The episode highlights the potential for AI to develop general intelligence and become a new species, distinct from humans.
Existential Risk: The episode discusses the potential existential risk posed by AI, particularly the lack of understanding of how human intelligence operates and the unpredictable nature of general intelligence.

TED Radio Hour – Episode: Future Consequences

Intelligence Explosion: The episode explores the concept of an “intelligence explosion,” where AI rapidly surpasses human intelligence and becomes self-improving.
Platform-Independent Intelligence: The episode argues that intelligence is not tied to biological substrates, suggesting that AI can surpass human intelligence.
Divergence of Goals: The episode expresses concern that even if AI is not intentionally malicious, its goals may diverge from human goals, potentially leading to harm.
Need for Urgent Action: The episode emphasizes the need for urgent action to address the potential risks of AI, comparing the situation to the arrival of an alien civilization.

Other Relevant Content

Philosophize This – Episode 186: Are We Heading for a Digital Prison – Panopticon (Foucault, Bentham, Cave). This episode explores the potential for AI to create a digital panopticon, where our behavior is constantly monitored and controlled.
Stuff You Should Know – Episode: What Makes a Genius. This episode discusses the nature of intelligence and genius, exploring how these concepts are often measured and the limitations of such measurements.

Overall

Your archive provides a rich overview of the current debate surrounding the rise of AI, covering its potential benefits, risks, and philosophical implications. It explores the potential for AI to surpass human intelligence, the importance of understanding AI’s goals, and the need for ethical considerations in AI development.

Conclusion: Reflections and Next Steps

The journey of building a Retrieval-Augmented Generation (RAG) system for podcasts has been both challenging and rewarding. Through this exploration, I’ve found ways to transform a passive collection of podcast content into an actionable and searchable knowledge base, turning fleeting moments of insight into something that can be easily recalled and used. The use of both the Gemma and Gemini models highlights the potential of RAG to bring real value, providing nuanced and context-rich responses from complex archives.

While there are still some technical hurdles, such as improving search strategies and prompt effectiveness, the results so far are promising. This system has already begun to solve a real problem: giving us the ability to recall and utilize knowledge that would otherwise be lost in hours of audio recordings.

If you’re interested in creating a similar system or expanding on what I’ve done, I encourage you to dive into the prototype and explore how RAG can be applied to your own datasets. Whether you’re working with podcasts, documents, or any other unstructured content, the potential for making that content more accessible and useful is vast.

Moving forward, I’ll continue refining the RAG system and experimenting with different models and configurations. If you have any questions, suggestions, or would like to share your own experiments, feel free to reach out.

Thank you for following along on this journey—let’s continue exploring the power of AI together.

Building My Homelab: The Journey from Gemma on a Laptop to a Rack Mounted Powerhouse

September 1, 2024September 8, 2024 Allen Hutchison4 Comments

In the ever-evolving landscape of AI, there are moments when new technologies capture your imagination and set you on a path of exploration and innovation. For me, one of those moments came with the release of the Gemma models. These models, with their promise of enhanced capabilities and local deployment, ignited my curiosity and pushed me to take a significant step in my homelab journey—building a system powerful enough to run these AI models locally.

The Allure of Local AI

I’ve spent the better part of 30 years immersed in the world of machine learning and artificial intelligence. My journey began in the 90s when I was an AI major in the cognitive science program at Indiana University. Back then, AI was a field full of promise, but the tools and technologies we take for granted today were still in their infancy. Fast forward a few decades, and I found myself at Google Maps, leading teams that used machine learning to transform raw imagery into structured data, laying the groundwork for many of the services we rely on daily.

By 2021, I had transitioned to the Core ML group at Google, where my focus shifted to the nuts and bolts of AI—low-level ML infrastructure like XLA, ML runtimes, and performance optimization. The challenges were immense, but so were the opportunities to push the boundaries of what AI could do. Today, as the leader of the AI Developer team at Google, I work with some of the brightest minds in the industry, building systems and technologies that empower developers to use AI in solving meaningful, real-world problems.

Despite all these experiences, the release of the Gemma models reignited a spark in me—a reminder of the excitement I felt as a student, eager to experiment and explore the limits of AI. These models offered something unique: the ability to run sophisticated AI directly on local hardware. For someone like me, who has always believed in the power of experimentation, this was an opportunity too good to pass up.

However, I quickly realized that while I could run these models on my Mac at home, I wanted something more—something that could serve as a shared resource for my family, a system that would be plugged in and available all the time. I envisioned a platform that not only supported these AI models but also provided the flexibility to build and explore other projects. To fully engage with this new wave of AI and create a hub for ongoing experimentation, I needed a machine that could handle the load and grow with our ambitions.

That’s when I decided to take the plunge and build a powerful homelab. I started by carefully spec’ing out the components, aiming to create a system that wasn’t just about raw power but also about versatility and future-proofing. Eventually, I turned to Steiger Dynamics to bring my vision to life. Their expertise in crafting high-performance, custom-built systems made them the perfect partner for this project. But before diving into the specifics of the build, let me share why the concept of local AI holds such a special allure for someone who has been in this field for as long as I have.

Spec’ing Out the Perfect Homelab

Building a homelab is both a science and an art. It’s about balancing performance with practicality, ensuring that every component serves a purpose while also leaving room for future expansion. With the goal of creating a platform capable of handling advanced AI models like Gemma, as well as other projects that might come along, I began the process of selecting the right hardware.

The Heart of the System: CPU and GPU

At the core of any powerful AI system are the CPU and GPU. After researching various options, I decided to go with the AMD Ryzen 9 7900X3D, a 12-core, 24-thread processor that offers the multithreaded performance necessary for AI workloads while still being efficient enough for a range of homelab tasks. But the real workhorse of this system would be the NVIDIA GeForce RTX 4090. This GPU, with its 24 GB of VRAM and immense processing power, was selected to handle the computational demands of AI training, simulations, and real-time applications.

The RTX 4090 wasn’t just about raw power; it was about flexibility. This GPU allows me to experiment with larger datasets, more complex models, and even real-time AI applications. Whether I’m working on image recognition, natural language processing, or generative AI, the RTX 4090 is more than capable of handling the task.

Memory and Storage: Speed and Capacity

To complement the CPU and GPU, I knew I needed ample memory and fast storage. I opted for 128GB of DDR5 5600 MT/s RAM to ensure that the system could handle multiple tasks simultaneously without bottlenecks. This is particularly important when working with large datasets or running several virtual machines at once—a common scenario in a versatile homelab environment.

For storage, I selected two 4 TB Samsung 990 PRO Gen4 NVMe SSDs. These drives provide the speed needed for active projects, with read and write speeds of 7,450 and 6,900 MB/s, respectively, ensuring quick access to data and fast boot times. The choice of separate drives rather than a RAID configuration allows me to manage my data more flexibly, adapting to different projects as needed.

Cooling and Power: Reliability and Efficiency

Given the power-hungry components, proper cooling and a reliable power supply were non-negotiable. I chose a Quiet 360mm AIO CPU Liquid Cooling system, equipped with six temperature-controlled, pressure-optimized 120mm fans in a push/pull configuration. This setup ensures that temperatures remain in check, even during prolonged AI training sessions that can generate significant heat.

The power supply is a 1600 Watt Platinum unit with a semi-passive fan that remains silent during idle periods and stays quiet under load. This ensures stable power delivery to all components, providing the reliability needed for a system that will be running almost constantly.

Building for the Future

Finally, I wanted to ensure that this homelab wasn’t just a short-term solution but a platform that could grow with my needs. The ASUS ProArt X670E-Creator Wifi motherboard I selected provides ample expansion slots, including dual PCIe 8x lanes, which are perfect for future upgrades, whether that means adding more GPUs or expanding storage. With 10G Ethernet and Wi-Fi 6E, this system is also well-equipped for high-speed networking, both wired and wireless.

Throughout this process, my choices were heavily influenced by this Network Chuck video. His insights into building a system for local AI, particularly the importance of choosing the right balance of power and flexibility, resonated with my own goals. Watching his approach to hosting AI models locally helped solidify my decisions around components and made me confident that I was on the right track.

With all these components selected, I turned to Steiger Dynamics to assemble the system. Their expertise in custom builds meant that I didn’t have to worry about the finer details of putting everything together; I could focus on what mattered most—getting the system up and running so I could start experimenting.

Bringing the System to Life: Initial Setup and First Experiments

Once the system arrived, I was eager to get everything up and running. Unboxing the hardware was an exciting moment—seeing all the components I had carefully selected come together in a beautifully engineered machine was incredibly satisfying. But as any tech enthusiast knows, the real magic happens when you power on the system for the first time.

Setting Up Proxmox and Virtualized Environments

For this build, I chose to run Proxmox as the primary operating system. Proxmox is a powerful open-source virtualization platform that allows me to create and manage multiple virtual machines (VMs) on a single physical server. This choice provided the flexibility to run different operating systems side by side, making the most of the system’s powerful hardware.

To streamline the setup process, I utilized some excellent Proxmox helper scripts available on GitHub. These scripts made it easier to configure and manage my virtual environments, saving me time and ensuring that everything was optimized for performance right from the start.

The first VM I set up was Ubuntu 22.04 LTS, which would serve as the main environment for AI development. Ubuntu’s long-term support and robust package management make it an ideal choice for a homelab focused on AI and development. The installation process within Proxmox was smooth, and soon I had a fully functional virtual environment ready for configuration.

I started by installing the necessary drivers and updates, ensuring that the NVIDIA RTX 4090 and other components were operating at peak performance. The combination of the AMD Ryzen 9 7900X3D CPU and the RTX 4090 GPU provided a seamless experience, handling everything I threw at it with ease. With the virtualized Ubuntu environment fully updated and configured, it was time to dive into my first experiments.

Running the First AI Models

With the system ready, I turned my attention to running AI models locally using Ollama as the model management system. Ollama provided an intuitive way to manage and deploy models on my new setup, ensuring that I could easily switch between different models and configurations depending on the project at hand.

The first model I downloaded was the 24B Gemma model. The process was straightforward, thanks to the ample power and storage provided by the new setup. The RTX 4090 handled the model with impressive speed, allowing me to explore its capabilities in real-time. I could experiment with different parameters, tweak the model, and see the results almost instantaneously.

Exploring Practical Applications: Unlocking the Potential of My Homelab

With the system fully operational and the Gemma model successfully deployed, I began exploring the practical applications of my new homelab. The flexibility and power of this setup meant that the possibilities were virtually endless, and I was eager to dive into projects that could take full advantage of the capabilities I now had at my disposal.

Podcast Archive Project

One of the key projects I’ve been focusing on is my podcast archive project. With the large Gemma model running locally, I’ve been able to experiment with using AI to transcribe, analyze, and categorize vast amounts of podcast content. The speed and efficiency of the RTX 4090 have transformed what used to be a time-consuming process into something I can manage seamlessly within my homelab environment.

The ability to run complex models locally has also allowed me to iterate rapidly on how I approach the organization and retrieval of podcast data. I’ve been experimenting with different methods for tagging and indexing content, making it easier to search and interact with large archives. This project has been particularly rewarding, as it combines my love of podcasts with the cutting-edge capabilities of AI.

General Conversational Interfaces

I’ve been exploring is setting up general conversational interfaces. With the Gemma model’s conversational abilities, I’ve been able to create clients that facilitate rich, interactive dialogues. Whether for casual conversation, answering questions, or exploring specific topics, these interfaces have proven to be incredibly versatile.

Getting the models up and running with these clients was a straightforward process, and I’ve been experimenting with different use cases—everything from personal assistants to educational tools. The flexibility of the Gemma model allows for a wide range of conversational applications, making this an area ripe for further exploration.

Expanding the Homelab’s Capabilities

While I’m already taking full advantage of the system’s current capabilities, I’m constantly thinking about ways to expand and optimize the homelab further. Whether it’s adding more storage, integrating additional GPUs for even greater computational power, or exploring new software platforms that can leverage the hardware, the possibilities are exciting.

The Journey Continues

This is just the beginning of my exploration into what this powerful homelab can do. With the hardware now in place, I’m eager to dive into a myriad of projects, from refining my podcast archive system to pushing the boundaries of conversational AI. The possibilities are endless, and the excitement of discovering new applications and optimizing workflows keeps me motivated.

As I continue to explore and experiment, I’ll be sharing my experiences, insights, and challenges along the way. There’s a lot more to come, and I’m excited to see where this journey takes me. I invite you, my readers, to come along for the ride—whether you’re building your own homelab, curious about AI, or just interested in technology. Together, we’ll see just how far we can push the boundaries of what’s possible with this incredible setup.

Letters from Silicon Valley

In “Letters from Silicon Valley,” I write about the convergence of technology, life, and creativity, sharing insights from my extensive experience in the tech industry along with my personal adventures in woodworking, music, and beyond.

Tag: Gemma

Automation and Measurement: Inside Gemini Scribe 4.8.0

The Push for Automation

The Scheduled Task Engine

How I Use This in Practice

Extracting the Agent Loop

Local Models with Ollama and Gemma 4

Moving from Guessing to Measuring

What’s Next?

Like this:

The AI-First Developer: A New Breed of Software Engineer

Like this:

Turning Podcasts into Your Personal Knowledge Base with AI

Gemma

Gemini

Conclusion: Reflections and Next Steps

Like this:

Building My Homelab: The Journey from Gemma on a Laptop to a Rack Mounted Powerhouse

The Allure of Local AI

Spec’ing Out the Perfect Homelab

The Heart of the System: CPU and GPU

Memory and Storage: Speed and Capacity

Cooling and Power: Reliability and Efficiency

Building for the Future

Bringing the System to Life: Initial Setup and First Experiments

Setting Up Proxmox and Virtualized Environments

Running the First AI Models

Exploring Practical Applications: Unlocking the Potential of My Homelab

Podcast Archive Project

General Conversational Interfaces

Expanding the Homelab’s Capabilities

Like this:

The Push for Automation

The Scheduled Task Engine

How I Use This in Practice

Extracting the Agent Loop

Local Models with Ollama and Gemma 4

Moving from Guessing to Measuring

What’s Next?

Share this:

Like this:

Share this:

Like this:

Gemma

Gemini

Conclusion: Reflections and Next Steps

Share this:

Like this:

The Allure of Local AI

Spec’ing Out the Perfect Homelab

The Heart of the System: CPU and GPU

Memory and Storage: Speed and Capacity

Cooling and Power: Reliability and Efficiency

Building for the Future

Bringing the System to Life: Initial Setup and First Experiments

Setting Up Proxmox and Virtualized Environments

Running the First AI Models

Exploring Practical Applications: Unlocking the Potential of My Homelab

Podcast Archive Project

General Conversational Interfaces

Expanding the Homelab’s Capabilities

Share this:

Like this: