A Starlink Mini antenna suction-mounted to the interior glass roof of a Tesla Model X, face pointing up through the glass at a twilight sky, with a Big Sur coastal vista visible through the windows.

Starlink Mini Field Review

April 12, 2026April 12, 2026 Allen Hutchison1 Comment

I almost didn’t bring it.

We were packing for a week-long road trip, the kind where you’re loading a car to the roof and every extra item has to justify its existence against the scrutiny of a finite trunk. The Starlink Mini was sitting on my desk, and for a moment I thought: it’s a vacation. Just use your phone. Let it go.

Then I remembered Death Valley.

A few years ago, we drove out to Death Valley for a long weekend. Somewhere past the park entrance the cell signal dropped and didn’t come back for several hours. No maps. No music. No way to look up whether the hotel had our reservation or whether the road ahead was clear. It wasn’t a crisis. We found our way, the trip was great. But it lodged in my brain as one of those small, avoidable frustrations that you file away and think about later. I started thinking about alternatives.

Fast-forward to earlier this year: I was looking to replace an old Verizon 5G hotspot that I’d been using as a backup internet connection. The Starlink Mini caught my attention for a pretty simple reason. The mobile plan is $50 per month for 100GB, and critically, you can pause it for $5 per month when you’re not using it. For something that might sit idle most of the year but be indispensable when you need it, that pricing model changes the math entirely. A hardware incentive made the upfront cost easier to swallow. I bought it as an emergency backup, not as a travel device.

The road trip reframed it.

We were heading down Highway 1 from San Jose to Santa Barbara, with a few stops along the way. Anyone who’s driven that stretch knows the Big Sur section the way sailors know certain straits: beautiful, unforgiving, and reliably hostile to cell service. I drive it often enough that the failure pattern is familiar: the car’s maps stop rendering new tiles, the streaming music starts cutting in and out, whatever question the kid in the back seat just asked becomes a question for later. Inconveniences, not crises, but they happen every single time. Both my 5G phones and the car’s own LTE connection consistently lose the signal in the same places. Standing in my driveway with the car half-packed, I thought: this is exactly what the thing is for. I threw it in the back, along with a suction cup mount I’d picked up — a 2-in-1 case and mount combo designed specifically for the Mini.

The mount was an experiment in itself, and it’s worth pausing on for a second. The Starlink Mini is designed to be mounted outside the vehicle. That’s what Starlink recommends, that’s what the hardware is built for, and that’s what every piece of documentation assumes. I stuck it to the inside of the glass roof of my Model X and let the antenna hang below it, face pointing straight up through the glass, directly over the rear passenger seat. It just fit. What I wasn’t sure about was whether any of this would actually work, because I was running the device in a configuration it wasn’t designed for. Would a satellite signal come through automotive glass well enough to matter? Would the antenna need to be precisely pointed, or would straight-up-ish be good enough? Would I end up pulling it out at every stop and doing the whole orientation dance I’d been expecting? Those felt like real unknowns. But it was a road trip, not a lab test, and sometimes you just ship the thing and see what happens.

If your vehicle doesn’t have a large flat glass roof to exploit, the outdoor-mount path is the one to take. You can attach the Mini to a roof rack or crossbars without any permanent modifications to the car, and you’ll almost certainly get better performance than I did running it through glass. The interior trick I used is a happy accident of driving a Model X, not a recommendation for everyone.

I brought it anyway. And by the end of the first day on the road, I was deeply glad I did.

This isn’t a post about working remotely. We took this trip for the right reasons: my son is finishing his sophomore year of high school and starting to think seriously about college, so we used the week to visit Cal Poly SLO and UC Santa Barbara. The kind of trip that reminds you why you work as hard as you do. But connectivity isn’t just about work. It’s about maps that don’t freeze, music that doesn’t stutter, the ability to pull up directions to the next stop without pulling over first. Most of the value of a good connection on a road trip is in the miles between places, not in the places themselves.

So let me tell you what actually happened.

Packing the Antenna

The Starlink Mini’s whole pitch is portability, and it delivers on that in a way the full-sized antenna simply can’t. The unit itself is roughly the size of a large laptop — thinner than a pizza box, lighter than a bag of dog food. It goes flat in a bag, doesn’t demand a dedicated case, and doesn’t feel like you’ve brought a piece of infrastructure on vacation. You’ve just brought a gadget.

For power, I ran it off a 12V adapter wired into the car. This is, I think, the right way to do it for road travel. Watching the draw in the Starlink app through the week, I almost never saw it exceed 20W, and most of the time it was lower, averaging right around 20W. A standard car outlet handles that without complaint. I didn’t need a power station, an inverter, or any custom rigging. You plug it in. It works.

Here’s the part that actually surprised me: there was no setup. Not “fast setup.” No setup. The antenna stayed exactly where I’d stuck it at the start of the trip, and I left it there for the whole week. Every stop, every hotel, every stretch of highway: it was already in place, already connected, already doing its job the moment I plugged in the 12V cable. I had gone into this expecting to be that guy in the parking lot pulling the antenna out, orienting it, consulting the sky map app, worrying about elevation angles. Instead I just drove.

What It Actually Did Out There

The strangest thing about this whole trip, and the thing I keep thinking about, is the moment that didn’t happen.

I was ready for Big Sur. I know that stretch of Highway 1, I know where the dead zones are, and I had a working theory that this would be the dramatic reveal: we’d hit the no-signal gap, I’d point at the Mini, and we’d be the family who still had maps and music when no one else did. That story never materialized. Not because it didn’t work, but because I never noticed.

The car was on the Starlink Wi-Fi the whole time. Maps kept routing. Music kept playing. Everyone’s phone stayed connected through the in-car network. Somewhere along that drive we passed through miles of cell dead zones without a single hiccup in anything we were doing, and the only reason I know that is because I thought to check my phone later and saw the zero-bar gaps that should have caused Death Valley-style pain. They just weren’t there. The infrastructure had been quietly carrying us the entire time.

That’s the moment I keep coming back to. The Death Valley experience was jarring because the loss was obvious and immediate. No maps, no music, a sudden reminder that your conveniences live on infrastructure you don’t own. The Starlink Mini didn’t fix that problem by giving me a workaround to pull out in an emergency. It fixed it by making the problem invisible. I can’t think of a better test for a piece of infrastructure than whether you stop noticing it. We did lose the connection once, in a tunnel, and everything came right back when we cleared the other side. The failure mode was the same as losing cell signal: obvious, brief, and boring.

The vista stops were the next best thing. Every time we pulled off at an overlook, I’d set the car to keep accessory power running so the Mini stayed up, which meant we stepped out into a place with zero cell service and our own private Wi-Fi hotspot parked a few feet away. This turned out to be where some of the best moments of the trip happened. Somebody would spot a rock formation and wonder what it was called. A few birds would wheel overhead and we’d want to know what species they were. Under normal travel conditions, those questions would just evaporate — filed away as “look it up later,” which almost always means never. With the antenna overhead, we could just ask. Curiosity became frictionless. And because we were stopped in the middle of nowhere with no other digital pull on our attention, those answers actually turned into conversations instead of the usual phone-zombie drift.

The moment that most caught me off guard, though, was a lunch stop. We parked the car at a spot with genuinely terrible cell service, all our luggage and gear still inside, and walked away to eat. I left the Mini powered up through the car’s accessory power, which meant the car itself stayed on the internet while we weren’t in it. Tesla’s Sentry mode uses the car’s connection to stream alerts and camera feeds — normally tied to whatever cell signal the car can grab on its own, which in that spot was nothing. But the car was happily connected to the Starlink Wi-Fi, so Sentry just kept working. I could check on the car from my phone during lunch and actually see what was happening around it. That was a use case I hadn’t planned for at all. It was the first time in the trip that the Mini stopped feeling like a travel gadget and started feeling like persistent infrastructure for the vehicle itself — a comms link that existed independent of whether any human was sitting inside the car. Peace of mind, delivered by an antenna looking straight up through a glass roof.

It’s worth being clear about where the Mini did and didn’t earn its keep on this trip. The hotels were all fine. Cell service and hotel Wi-Fi were perfectly adequate at every place we stayed, and I never once fired up the Mini as a destination device. The entire value proposition showed up in the driving, the vistas, and the parked-car moments in between — the stretches and stops where cell coverage got thin and where, historically, I would have just quietly lost the ability to navigate, stream, ask questions, or keep an eye on my car. The Mini made those gaps disappear so completely that I forgot they were even there.

What It Doesn’t Do

In the interest of intellectual honesty: it’s not magic.

Tree cover is the enemy. Dense canopy will interrupt the connection or degrade it significantly, and the obstruction map in the Starlink app is reasonably good at predicting this. The through-the-glass approach worked well on this trip because we were mostly on highways and in parking lots with open sky. If we’d been parked under heavy forest canopy I’d have had to think harder about placement, and an antenna mounted inside the car wouldn’t have been the right answer.

Weather, surprisingly, was not on this list. We drove through several stretches of rain and a lot of overcast, and the connection held through all of it without any obvious degradation. I went into the trip half-expecting to see throughput sag under bad weather and it just didn’t happen. That’s not a universal claim (you can imagine worse conditions), but for what a California spring throws at you, it’s a non-issue.

Satellite latency is also real, though I didn’t notice it much. For anything where you feel the round-trip time (voice calls, some gaming), it’s not the same as fiber. For everything else — browsing, streaming, music, mapping — it held up fine.

An Engineer on Vacation

Here’s the thing I kept thinking about on this trip, the thought that felt worth writing down.

I’ve spent a lot of the last few years building out my homelab into something I’m genuinely proud of. Rack-mounted servers, local AI models, a network I understand end-to-end. That infrastructure has a fixed address. It’s built for depth, not mobility.

What the Starlink Mini represents is a different layer of the stack — the part that moves. The connectivity substrate that you can carry with you without sacrifice. And what surprised me, genuinely, is how much that changes the feel of being on the road. I wasn’t tethered to the patchy mercy of cell towers between towns. I had my own infrastructure, and it came with me.

For most of human history, “infrastructure” meant something you built in a place and then stayed near. Railroads, power grids, phone lines. The thing that’s happening now, slowly and then all at once, is that infrastructure is becoming portable. The Starlink Mini isn’t the endpoint of that story, but it’s a clear data point in it.

I’m not arguing that you should work on every vacation. I didn’t. But I am arguing that having reliable connectivity transforms the texture of a trip in ways that have nothing to do with work. It means you can find the good restaurant instead of defaulting to what’s nearest. It means your kids can call their friends. It means you get the weather report that saves you an hour of driving into rain. Small things. Real things.

Would I Recommend It

This is the part where I’m supposed to tell you whether to buy it, and the honest answer depends almost entirely on what else you’re using it for. Let me actually walk through the math, because I think the pricing model is the thing that makes this post worth writing at all.

The Mini’s mobile plan is $50 per month for 100GB when active, but you can pause it for $5 per month when you’re not using it. So if you buy the hardware and leave the service paused most of the year, your standing cost is $60 annually. Each month you flip it on for a trip, you add $50 on top. For someone who only takes one connectivity-hostile trip per year, that means $110 all-in for a year with one active month.

If that’s your situation — one trip a year, I honestly don’t think you should buy this. The math doesn’t work and the rest of your use cases probably don’t justify the hardware. A better phone plan or a cellular hotspot will serve you better for less money.

Where it starts making sense is when you have a standing use case at home that earns back the $60 standby cost on its own. For me, that’s backup internet during the storms that reliably knock out my home connection every year. I’d already decided I wanted this for the house before I ever thought about taking it on a trip — the $5-a-month pause price justifies itself on the strength of the backup case alone, and everything else is gravy. For someone living out of an RV or a van full-time, the case is even easier: the Mini just becomes infrastructure, always on, always moving with you. You’d never pause it.

Once the at-home or full-time use case carries the standby cost, the travel capability becomes a structural bonus. I paid $50 to flip the service on for the week of this trip, and knowing what I know about Highway 1 — the failure pattern I’ve hit on every previous drive down, that was easy to justify. It’s $50 to make a known, recurring inconvenience quietly disappear on a trip where I was already spending ten times that on charging, lodging, and meals. It didn’t feel like a hard call.

So the short version of the recommendation is: don’t buy this as a travel device. Buy it for a standing use case you already have, and let the travel capability be the bonus you discover later. That’s the frame that made my decision easy, and I think it’s the frame that will hold up for most readers whose situations look anything like mine.

One more thing worth flagging before I wrap up: the 12V power setup worked flawlessly the entire week. I expected to be solving a power problem at some point and never had to. That’s the detail that surprised me most, actually: not the satellite performance, but how completely boring and reliable the day-to-day operation was.

I’ll keep it in the travel kit. The homelab stays home. But now part of it gets to come along.

Abstract digital visualization of glowing lines and nodes converging on a central geometric shape labeled 'AGENTS.md', symbolizing interconnected AI systems and a unifying standard.

On Context, Agents, and a Path to a Standard

October 31, 2025October 31, 2025 Allen HutchisonLeave a comment

When we were first designing the Gemini CLI, one of the foundational ideas was the importance of context. For an AI to be a true partner in a software project, it can’t just be a stateless chatbot; it needs a “worldview” of the codebase it’s operating in. It needs to understand the project’s goals, its constraints, and its key files. This philosophy isn’t unique; many agentic tools use similar mechanisms. In our case, it led to the GEMINI.md context system (which was first introduced in this commit) a simple Markdown file that acts as a charter, guiding the AI’s behavior within a specific repository.

At its core, GEMINI.md is designed for clarity and flexibility. It gives developers a straightforward way to provide durable instructions and file context to the model. We also recognized that not every project is the same, so we made the system adaptable. For instance, if you prefer a different convention, you can easily change the name of your context file with a simple setting.

This approach has worked well, but I’ve always been mindful that bespoke solutions, however effective, can lead to fragmentation. In the open, collaborative world of software development, standards are the bridges that connect disparate tools into a cohesive ecosystem.

That’s why I’ve been following the emergence of the Agents.md specification with great interest. We have several open issues in the Gemini CLI repo (like #406 and #12345) from users asking for Agents.md support, so there’s clear community interest. The idea of a universal standard for defining an AI’s context is incredibly appealing. A shared format would mean that a context file written for one tool could work seamlessly in another, allowing developers to move between tools without friction. I would love for Gemini CLI to become a first-class citizen in that ecosystem.

However, as I’ve considered a full integration, I’ve run into a few hurdles—not just technical limitations, but patterns of use that a standard would need to address. This has led me to a more concrete set of proposals for what an effective standard would need.

So, what would it take to bridge this gap? I believe with a few key additions, Agents.md could become the robust standard we need. Here’s a more detailed breakdown of what I believe is required:

A Standard for @file Includes: From my perspective, this is mandatory. In any large project, you need the ability to break down a monolithic context file into smaller, logical, and more manageable parts—much like a C/C++ #include. A simple @file directive, which GEMINI.md and some other systems support, would provide the modularity needed for real-world use.
A Pragma System for Model-Specific Instructions: Developers will always want to optimize prompts for specific models. To accommodate this without sacrificing portability, the standard could introduce a pragma system. This could leverage standard Markdown callouts to tag instructions that only certain models should pay attention to, while others ignore them. For example:

> [!gemini]
> Gemini only instructions here

> [!claude]
> Claude only instructions here

> [!codex]
> Codex only instructions here
Clear Direction on Context Hierarchy: We need clear rules for how an agentic application should discover and apply context. Based on my own work, I’d propose a hierarchical strategy. When an agent is invoked, it should read the context in its current directory and all parent directories. Then, when it’s asked to read a specific file, it should first apply the context from that file’s local directory before applying the broader, inherited context. This ensures that the most specific instructions are always considered first, creating a predictable and powerful system.

If the Agents.md standard were to incorporate these three features, I believe it would unlock a new level of interoperability for AI developer tools. It would create a truly portable and powerful way to define AI context, and I would be thrilled to move Gemini CLI to a model of first-class support.

The future of AI-assisted development is collaborative, and shared standards are the bedrock of that collaboration. I’ve begun outreach to the Agents.md maintainers to discuss these proposals, and I’m optimistic that with community feedback, we can get there. If you have your own opinions on this, I’d love to hear them in the discussion on our repo.

Unlocking the Future of Coding: Introducing the Gemini CLI

June 25, 2025August 10, 2025 Allen Hutchison2 Comments

Back in April, I wrote about waiting for the true AI coding partner. I articulated a vision for an AI that transcends mere code generation, one that truly understands context, acts autonomously within our development environments, and collaborates with us iteratively. Today, I’m thrilled to announce a significant step towards that vision: the launch of the Gemini CLI.

For too long, AI coding assistance has often felt like a disconnected assistant. While dedicated AI-powered IDEs like Cursor have made great strides, the common experience still involves copy-pasting code into a separate interface or breaking flow to get suggestions. This breaks flow, loses context, and frankly, isn’t how truly collaborative partners work. We need an AI that lives where we live—in the terminal, within our projects, and deeply integrated into our workflow.

This is precisely what the Gemini CLI sets out to achieve. It’s not just a fancy chatbot for your command line; it’s an experimental interface designed to bring the power of Gemini directly into your development loop, enabling intelligent, contextual, and actionable AI assistance.

It’s for this very reason that I’ve been quite heads-down over the last few months, working with a super talented team to bring this application to life. It has genuinely been one of my most fun experiences at Google in the 20+ years that I’ve been here, and I feel incredibly fortunate to have had the chance to collaborate with such brilliant people across the company.

The Power of Small Tools, Amplified by AI

In May, I explored the concept of small tools, big ideas. The premise was simple: complex problems are often best tackled by composing many small, powerful, and specialized tools. This philosophy is at the very heart of the Gemini CLI’s design.

Instead of a monolithic AI trying to do everything at once, the Gemini CLI empowers Gemini with a suite of familiar command-line tools. Imagine an AI that can:

Read and Write Files: Using read_file and write_file, it can inspect your codebase, understand existing logic, and propose modifications directly to your files.
Navigate Your Project: With list_directory and grep, it can explore your project structure, locate relevant files, or find specific patterns across your repository, just like you would.
Execute Shell Commands: The run_shell_command tool allows Gemini to execute commands, build your project, run tests, or even interact with external services, providing real-time feedback.
Search the Web: Need to look up an API, debug an error message, or find best practices? The google_web_search tool lets Gemini leverage the vastness of the internet to inform its responses and actions.
Edit with Precision: Beyond simple file writes, the edit_file tool allows for granular, diff-based modifications, ensuring changes are precise and reviewable.

This approach means Gemini isn’t guessing; it’s acting. It’s using the same building blocks you use every day, but with its powerful reasoning capabilities to orchestrate them towards your goals.

A Truly Contextual and Collaborative Partner

The Gemini CLI maintains a persistent session, remembering your conversation history, the files it has examined, and the results of previous tool executions. This “conversational memory” and contextual understanding are critical. It allows for a natural, iterative back-and-forth, where the AI builds on prior interactions and its understanding of your project state.

You can ask Gemini to:

“Find all JavaScript files in this directory that import React.” (Leveraging list_directory and grep)
“Refactor this component to use hooks.” (Involving read_file, edit_file, and potentially run_shell_command to run tests).
“What’s the best way to implement X in Python given these files?” (Using read_file to understand your existing code and google_web_search for best practices).

The workflow is truly interactive. Gemini proposes actions, and you have the power to approve them or guide it further. This human-in-the-loop design ensures you’re always in control, fostering a collaborative partnership rather than a black-box operation.

Built by Gemini CLI, For Everyone

It’s particularly exciting to share that this project was started by a small and scrappy team, and we leveraged Gemini CLI itself to help write Gemini CLI. Many of us now work almost exclusively within Gemini CLI, often using our IDEs only for viewing diffs.

And while its origins are in coding, Gemini CLI is incredibly versatile for many tasks outside of traditional development. Personally, I love using it to manage my home lab, to bulk rename and reformat files for my podcast project, and to generally act as a seamless go-between for anything complicated in GitHub. Increasingly, I’ve also been using Gemini CLI with Obsidian to understand and extract insights from my vault. With over 9000 files in my work vault alone, Gemini CLI lets me ask questions of the entire vault and even make large refactoring-style changes across the entire thing.

Beyond Today: Extensibility

One of the most exciting aspects of the Gemini CLI, and a direct nod to the “small tools, big ideas” philosophy, is its extensibility. The underlying architecture allows developers to define custom tools. This means you can teach Gemini to interact with your specific internal systems, proprietary APIs, or niche development tools. The possibilities are endless, transforming Gemini into an AI assistant perfectly tailored to your unique development environment.

Get Started Today

The Gemini CLI represents a significant leap forward in bringing intelligent AI assistance directly to where developers work most effectively: the command line. It’s a practical realization of the “true AI coding partner” vision, built on the principle that small, well-designed tools can achieve big ideas when orchestrated by a powerful intelligence.

Ready to try it out? Head over to the Gemini CLI GitHub repository to get started. Explore the commands, experiment with its capabilities, and let’s shape the future of AI-powered development together.

I’m incredibly excited about what this means for developer productivity and the evolving role of AI in our daily coding lives. Let me know what you build with it!

Docker Did Nothing Wrong (But I’m Trying Podman Anyway)

April 26, 2025 Allen Hutchison3 Comments

Hey everyone, welcome back to the homelab series! One of the constant themes in managing a growing homelab is figuring out the best way to run and orchestrate all the different services we rely on. For me, this has meant evolving my setup over time into distinct systems to keep things scalable and maintainable.

My current homelab nerve center is spread across a few key machines: ns1 and ns2 handle critical DNS redundancy, beluga is the fortress for storage and archives, and bubba acts as the powerhouse for all my AI experiments and compute-heavy tasks.

Up until now, Docker has been the backbone for deploying and managing services across these systems. Whether it’s containerizing AI models on bubba or managing my core network services, it’s been indispensable for packaging applications and keeping dependencies tidy. It’s served me well, allowing for rapid deployment and relatively easy management.

However, the tech landscape is always shifting, and exploring new tools is part of the homelab fun, right? Lately, I’ve been hearing more about Podman as a powerful, open-source alternative to Docker. Recent changes in the container world and simple curiosity led me to check out this excellent video overview (which I highly recommend watching!):

This video really illuminated what Podman brings to the table and sparked a ton of ideas about how it could potentially fit into, and even improve, my homelab workflow. So, in this post, I want to walk through my current Docker-based setup in a bit more detail, share the specific Podman features from the video that caught my eye, and outline some experiments I’m planning for the future. Let’s dive in!

My Current Homelab: A Multi-Server Approach

As I mentioned, to keep things organized as my homelab grew, I settled on dedicating specific roles to my main servers using Proxmox VE as the foundation for virtualization:

ns1 and ns2 — The Backbone of Service Discovery: These identical servers run my critical internal DNS, ensuring all my services can find each other reliably. Redundancy here is key – if one fails, the other keeps everything connected.
bubba — The AI Workhorse: This is my compute powerhouse, equipped with a GPU and plenty of RAM. It’s dedicated to running local AI models like LLMs via Ollama and interacting with them through tools like Open WebUI. It handles tasks like podcast transcription, embeddings, and inference workloads.
beluga — The Keeper of the Archives: With its focus on storage, beluga houses my media library, data archives, and backups. It’s the long-term home for files and feeds data to bubba when needed.

This separation of duties has been crucial for keeping things maintainable and scalable.

Docker’s Role in My Current Setup

So, how do I actually run the services on these different machines? Docker and Docker Compose are absolutely central to making this multi-server setup manageable. Here’s a glimpse into how it’s wired together:

Base Services Everywhere: I have a base Docker Compose file that runs on most, if not all, of these servers. It includes essential plumbing:
- Traefik: My go-to reverse proxy, handling incoming traffic and routing it to the correct service container, plus managing SSL certificates.
- Portainer Agent: Allows me to manage the Docker environment on each host from a central Portainer instance (the agent itself is part of the Portainer ecosystem).
- Watchtower: Automatically updates containers. (I use this cautiously – often pinning major versions in my compose files while letting Watchtower handle minor updates, though for rapidly evolving things like Ollama, I sometimes let it pull latest.)
- Dozzle Agent: Feeds container logs to a central Dozzle instance for easy viewing (the agent enables the main Dozzle UI).
DNS Servers (ns1/ns2): On top of the base services, the DNS servers have a dedicated compose file that adds CoreDNS, specifically using the coredns_omada project which cleverly mirrors DHCP hostnames from my TP-Link Omada network gear into DNS – super handy! ns1 also runs the central dozzle instance (the log viewer UI) and Heimdall as my main homelab dashboard, providing a single pane of glass overview. Docker makes running these critical but relatively lightweight infrastructure services incredibly straightforward.
AI Workloads (bubba): On the AI workhorse bubba, Docker is essential for managing the AI stack. I run ollama to serve LLMs and open-webui as a frontend, all containerized. This simplifies deployment, dependency management, and allows me to easily experiment with different models and tools without polluting the host system.
Storage Server Utilities (beluga): Even the storage server beluga runs containers. I have PostgreSQL running here, which primarily backs the Speedtest-Tracker service but also serves as my go-to relational database for any other containers or services that need one. Again, Docker neatly packages these distinct applications.

Essentially, Docker Compose defines the what and how for each service stack on each server, and Docker provides the runtime environment. This containerization strategy is what allows me to easily deploy, update, and manage this diverse set of applications across my specialized hardware.

Video Insights: How Podman Could Fit into This Picture

Watching the video overview of Podman did more than just introduce another tool; it sparked concrete ideas about how its specific features could integrate with, and perhaps even improve, my current homelab operations distributed across ns1/ns2, bubba, and beluga.

Perhaps the most compelling concept showcased was Podman’s native support for Pods. While Docker Compose helps manage multiple containers, the idea of grouping tightly coupled containers – like my ollama and open-webui stack on bubba, potentially along with a future vector database – into a single, network-integrated unit feels intrinsically cleaner. Managing this AI application suite as one atomic Pod could simplify networking and lifecycle management significantly. I could even see potential benefits in treating the base services running on each host (traefik, portainer-agent, etc.) as a coherent Pod.

Another significant architectural difference highlighted is Podman’s daemonless nature. Running without a central, privileged daemon is interesting for a couple of reasons. While bubba has resources to spare, my leaner DNS servers (ns1/ns2) might benefit from even slight resource savings, though that needs practical testing. More importantly, this architecture often makes running containers as non-root (rootless) more straightforward. This has direct security appeal, especially for the complex AI applications processing data on bubba or the critical DNS infrastructure on ns1/ns2, potentially reducing the attack surface compared to running everything through a root daemon.

Furthermore, the video demonstrated Podman’s ability to generate Kubernetes YAML manifests directly from running containers or pods. This feature is particularly exciting for a homelabber keen on learning! It presents a practical pathway to experimenting with Kubernetes distributions like K3s or Minikube. I could define my AI stack on bubba using Podman Pods and then export it to a Kubernetes-native format, greatly lowering the barrier to entry for learning K8s concepts with my existing workloads. Even outside of a full K8s deployment, having standardized YAML definitions could make my application deployments more portable and consistent.

Of course, for those who prefer a graphical interface, the video also touched upon Podman Desktop. While I currently use Portainer, exploring Podman Desktop could offer a different management perspective, perhaps one more focused on visualizing and managing these Pods. And crucially, knowing that Podman aims for Docker CLI compatibility for many common commands makes the idea of experimenting much less daunting – it suggests I wouldn’t have to relearn everything from scratch.

So, rather than just being ‘another container tool’, the video positioned Podman as offering specific solutions – particularly around multi-container application management via Pods, security posture through its daemonless design, and bridging towards Kubernetes – that seem highly relevant to the challenges and opportunities in my own homelab setup.

Future Homelab Goals: Experimenting with Podman

So, all this reflection on my current setup and the potential benefits highlighted in the video leads to the obvious next question: what am I actually going to do about it? While I’m not planning a wholesale migration away from Docker immediately – it’s deeply integrated and works well – the possibilities offered by Podman are too compelling not to explore.

My plan is to dip my toes into the Podman waters with a few specific, manageable experiments, leveraging the flexibility of my Proxmox setup:

Dedicated Test Environment: Instead of installing Podman directly onto one of my existing servers like bubba initially, I’ll spin up a fresh virtual machine using Proxmox dedicated solely to Podman testing. This is one of the huge advantages of using Proxmox – I can create an isolated sandbox environment easily. This clean slate will be perfect for getting Podman installed, getting comfortable with the basic CLI commands (leveraging that Docker compatibility mentioned at), and working out any kinks without impacting my operational services.
Migrating a Stack to a Pod: Once the test VM is set up, the real test will be taking my current ollama and open-webui Docker Compose stack (conceptually, at least) and recreating it as a Podman Pod within that VM. This will directly evaluate the Pod concept for managing related services and let me see how the networking and management feel compared to Compose in a controlled environment.
Testing a Simple Service: To get a feel for basic container management and the daemonless architecture in this new VM, I’ll deploy a simpler, standalone service using Podman. Perhaps I’ll containerize a small utility or pull down a common image like postgres or speedtest-tracker just to compare the basic workflow.
Generating Kubernetes Manifests: Once I (hopefully!) have the AI stack running in a Podman Pod in the test VM, I definitely want to try the Kubernetes YAML generation feature. Even if I don’t deploy it immediately, I want to see how Podman translates the Pod definition into Kubernetes resources within this testbed. This feels like a practical homework assignment for my K8s learning goals.
Exploring Podman Desktop: Finally, I’ll likely install and explore Podman Desktop within the test VM. I’m curious to see what its visualization and management capabilities look like, especially for Pods, compared to my usual tools.

This isn’t about finding a ‘winner’ between Docker and Podman right now, but rather about hands-on learning in a safe, isolated environment thanks to Proxmox. It’s about understanding the practical advantages and disadvantages of Podman’s approach before considering if or how I might integrate it into my primary homelab systems (ns1/ns2, bubba, beluga) later on. I’m looking forward to experimenting and, of course, I’ll be sure to share my findings and experiences here in future posts!

That’s the plan for now! Docker continues to be a vital part of my homelab, but exploring tools like Podman is essential for learning and potentially improving how things run. The video provided some great insights, and I’m excited to see how these experiments turn out.

What about you? Are you using Docker, Podman, or something else in your homelab? Have you experimented with Pods or rootless containers? Let me know your thoughts and experiences in the comments below!

Cracking the Code: Exploring Transcription Methods for My Podcast Project

September 21, 2024September 21, 2024 Allen Hutchison3 Comments

In previous posts, I outlined the process of downloading and organizing thousands of podcast episodes for my AI-driven project. After addressing the chaos of managing and cleaning up nearly 7,000 files, the next hurdle became clear: transcription. Converting all of these audio files into readable, searchable text would unlock the real potential of my dataset, allowing me to analyze, tag, and connect ideas across episodes. Since then, I’ve expanded my collection to over 10,000 episodes, further increasing the importance of finding a scalable transcription solution.

Why is transcription so critical? Most AI tools available today aren’t optimized to handle audio data natively. They need input in a format they can process—typically text. Without transcription, it would be nearly impossible for my models to work with the podcast content, limiting their ability to understand the material, extract insights, or generate meaningful connections. Converting audio into text not only makes the data usable by AI models but also allows for deeper analysis, such as searching across episodes, generating summaries, and identifying recurring themes.

In this post, I’ll explore the various transcription methods I considered, from cloud services to local AI solutions, and how I ultimately arrived at the right balance of speed, accuracy, and cost.

What Makes a Good Transcription?

Before diving into the transcription options I explored, it’s important to outline what I consider to be the key elements of a good transcription. When working with large amounts of audio data—like podcasts—the quality of the transcription can make or break the usability of the resulting text. Here are the main criteria I looked for:

Accuracy: The most obvious requirement is that the transcription needs to be accurate. It should capture what is said without altering the meaning. Misinterpretations, skipped words, or incorrect phrasing can lead to significant misunderstandings, especially when trying to analyze data from hours of dialogue.
Speaker Diarization: Diarization is the process of distinguishing and labeling different speakers in an audio recording. Many of the podcasts in my dataset feature multiple speakers, and a good transcription should clearly indicate who is speaking at any given time. This makes the conversation easier to follow and is essential for both readability and for further processing, like analyzing individual speaker contributions or summarizing conversations.
Punctuation and Formatting: Transcriptions need to be more than a raw dump of words. Proper punctuation and sentence structure make the resulting text more readable and usable for downstream tasks like summarization or natural language processing.
Identifying Music and Sound Effects: Many podcasts feature music, sound effects, or background ambiance that are integral to the listening experience. A good transcription should be able to note when these elements occur, providing context about their role in the episode. This is especially important for audio that is heavily produced, as these non-verbal elements often contribute to the overall meaning or mood.
Scalability: Finally, when dealing with tens of thousands of podcast episodes, scalability becomes critical. A transcription tool should not only work well for a single episode but also maintain performance when scaled to thousands of hours of audio. The ability to process large volumes of data efficiently without sacrificing quality is a key factor for a project of this scale.

These criteria shaped my approach to evaluating different transcription tools, helping me determine what worked—and what didn’t—for my specific needs.

Using Gemini for Transcription: A First Attempt

Since I work with Gemini and its APIs professionally (about me), I saw this transcription project as an opportunity to deepen my understanding of the system’s capabilities. My early experiments with Gemini were promising; the model produced highly accurate, diarized transcriptions for the first few podcast episodes I tested. I was excited by the results and the prospect of integrating Gemini into my workflow for this project. It seemed like a perfect fit—Gemini was delivering exactly what I needed in terms of transcription accuracy, making me optimistic about scaling this approach.

Early Success and Optimism

In those initial tests, Gemini excelled in several areas. The transcriptions were accurate, the diarization was clear, and the output was well-formatted. Given Gemini’s strength in understanding context and language, the transcripts felt polished, even in conversations with overlapping speech or complex dialogue. This early success gave me confidence that I had found a tool that could handle my vast dataset of podcasts while maintaining high quality.

The Challenges of Scaling

As I continued to test Gemini on a larger scale, I encountered two key issues that ultimately made the tool unsuitable for this project.

The biggest challenge was recitation errors. The Gemini API includes a mechanism that prevents it from returning text if it detects that it might be reciting copyrighted information. While this is an understandable safeguard, it became a major roadblock for my use case. Given that my project is dependent on converting copyrighted audio content into text, it wasn’t surprising that Gemini flagged some of this content during its recitation checks. However, when this error occurred, Gemini didn’t return any transcription, making the tool unreliable for my needs. I required a solution that could consistently transcribe all the audio I was working with, not just portions of it.

That said, when Gemini did return transcriptions, the quality was excellent. For instance, here’s a sample from one of the podcasts I processed using Gemini:

Where Does All The TSA Stuff Go?
0:00 - Intro music playing.
1:00 - [SOUND] Transition to podcast
1:01 - Kimberly: Hi, this is Kimberly, and we're at New York airport, and we just had our snow globe 
confiscated.
1:08 - Kimberly: Yeah, we're so pissed, and we want to know who gets all of the confiscated stuff, 
where does it go, and will we ever be able to even get our snow globe back?

In addition to the recitation issue, I didn’t want to rely on Gemini for some transcriptions and another tool for the rest. For this project, it was important to have a consistent output format across all my transcriptions. Switching between tools would introduce inconsistencies in the formatting and potentially complicate the next stages of analysis. I needed a single solution that could handle the entire podcast archive.

Using Whisper for High-Quality AI Transcription

After experiencing challenges with Gemini, I turned to OpenAI’s Whisper, a model specifically designed for speech recognition and transcription. Whisper is an open-source tool known for its accuracy in handling complex audio environments. Given that my podcast collection spans a variety of formats and sound qualities, Whisper quickly emerged as a viable solution.

Why Whisper?

Accuracy: Whisper consistently delivered highly accurate transcriptions, even in cases with challenging audio quality, background noise, or overlapping speakers. It also performed well with speakers of different accents and speech patterns, which is critical for the diversity of content I’m working with.
Diarization: While Whisper doesn’t have diarization built-in, its accuracy with speech segmentation allowed for easy integration with additional tools to identify and separate speakers. This flexibility allowed me to maintain clear, speaker-specific transcripts.
Open Source Flexibility: Whisper’s open-source nature allowed me to deploy it locally on my Proxmox setup, leveraging the full power of my NVIDIA RTX 4090 GPU. This setup made it possible to transcribe podcasts in near real-time, which was crucial for processing a large dataset efficiently.

Performance on My Homelab Setup

By running Whisper locally with GPU acceleration, I saw significant improvements in processing time. For shorter podcasts, Whisper was able to transcribe episodes in a matter of minutes, while longer episodes could be transcribed in near real-time. This speed, combined with its accuracy, made Whisper a strong contender for handling my entire collection of over 10,000 episodes.

For instance, here’s the same podcast episode that was transcribed with Whisper:

Hi, this is Kimberly.
And we're at Newark Airport.
And we just had our snow globe confiscated.
Yeah, we're so pissed.
And we want to know who gets all of the confiscated stuff.
Where does it go?
And will we ever be able to even get our snow globe back?

Challenges and Considerations

While Whisper excelled in many areas, one consideration is its resource demand. Running Whisper locally with GPU acceleration requires substantial computational resources. For users without access to powerful hardware, this could be a limitation. Whisper also lacks built-in diarization, which means it cannot automatically differentiate between speakers. This requires additional post-processing or integration with other tools to achieve the same level of speaker clarity. However, for my setup, the performance trade-off was worth it, as it allowed me to maintain full control over the transcription process without relying on external services.

Comparing Transcription Methods and Moving Forward

After testing both Gemini and Whisper, it became clear that each tool has its strengths, but Whisper ultimately emerged as the best option for my project’s needs. While Gemini delivered higher-quality transcriptions overall, the recitation errors and lack of reliability when dealing with copyrighted material made it unsuitable for handling my entire dataset. Whisper, on the other hand, provided consistent, highly accurate transcriptions across the board and scaled well to the volume of audio I needed to process.

Gemini’s Strengths and Limitations

Strengths: Gemini produced extremely polished and accurate transcriptions, outperforming Whisper in many cases. The diarization was clear, and the formatting made the transcripts easy to read and analyze.
Limitations: Despite its transcription quality, Gemini’s API recitation checks became a major roadblock, which made it unreliable for my use case. Additionally, I needed a single solution that could provide consistent output across all episodes, which Gemini couldn’t guarantee due to these errors.

Whisper’s Strengths and Limitations

Strengths: Whisper stood out for its high accuracy, scalability, and open-source flexibility. Running Whisper locally allowed me to transcribe thousands of episodes efficiently, while its robust handling of varied audio content—from background noise to multiple speakers—was a major advantage.
Limitations: Whisper lacks built-in diarization, which means it cannot automatically differentiate between speakers. This requires additional post-processing or integration with other tools to achieve the same level of speaker clarity. Additionally, Whisper demands significant computational resources, which could be a barrier for users without access to powerful hardware.

Final Thoughts

As I move forward with this project, Whisper will be my go-to tool for transcribing the remaining episodes. Its ability to process large amounts of audio data reliably and consistently has made it the clear winner. While there may still be room for further exploration—particularly around post-processing clean-up or integrating diarization tools—Whisper has given me the foundation I need to turn my podcast archive into a fully searchable, AI-powered dataset.

In my next post, I’ll outline how I built my transcription system using Whisper to handle all of these episodes. It was a unique experience, as I used a model to write the entire application for this project. Stay tuned for a deep dive into the system’s architecture and the steps I took to automate the transcription process at scale.

My 9,000 File Problem: How Gemini and Linux Saved My Podcast Project

September 11, 2024 Allen Hutchison1 Comment

We live in a world awash in data, a tidal wave of information that promises to unlock incredible insights and fuel a new generation of AI-powered applications. But as anyone who has ever waded into the deep end of a data-intensive project knows, this abundance can quickly turn into a curse. My own foray into building a podcast recommendation system recently hit a major snag when my meticulously curated dataset went rogue. The culprit? A sneaky infestation of duplicate embedding files, hiding among thousands of legitimate ones, each with “_embeddings” endlessly repeating in their file names. Manually tackling this mess would have been like trying to drain the ocean with a teaspoon. I needed a solution that could handle massive amounts of data and surgically extract the problem files.

Gemini: The AI That Can Handle ALL My Data

Faced with this mountain of unruly data, I knew I needed an extraordinary tool. I’d experimented with other large language models in the past, but they weren’t built for this. My file list, containing nearly 9,000 filenames (about 100,000 input tokens in this case), and proved too much for them to handle. That’s when I turned to Gemini, with it’s incredible ability to handle large context windows. With a touch of trepidation, I pasted the entire list into Gemini 1.5 Pro in AI Studio, hoping it wouldn’t buckle under the weight of all those file paths. To my relief, Gemini didn’t even blink. It calmly ingested the massive list, ready for my instructions. With a mix of hope and skepticism, I posed my question: “Can you find the files in this list that don’t match the _embeddings.txt pattern?” In a matter of seconds, Gemini delivered. It presented a concise list of the offending filenames, each one a testament to its remarkable pattern recognition skills.

To be honest, I hadn’t expected it to work. Pasting in such a huge list felt like a shot in the dark, and when I later tried the same task with other models I just got errors. But that’s one of the things I love about working with large models like Gemini. The barrier to entry for experimentation is so low. You can quickly iterate, trying different prompts and approaches to see what sticks. In this case, it paid off spectacularly.

From AI Insights to Linux Action

Gemini didn’t just leave me with a list of bad filenames; it went a step further, offering a solution. When I asked, “What Linux command can I use to delete these files?”, it provided the foundation for my command. I wanted an extra layer of safety, so instead of deleting the files outright, I first moved them to a temporary directory using this command:

find /srv/podcasts/Invisibilia -type f -name "*_embeddings_embeddings*" -exec mv {} /tmp \;

This command uses the find command, and it uses -exec to execute a command on each found file. Here’s how it works:

-exec: Tells find to execute a command.
mv: The move command.
{}: A placeholder that represents the found filename.
/tmp: The destination directory for the moved files.
\;: Terminates the -exec command.

By moving the files to /tmp, I could examine them one last time before purging them from my system. This extra step gave me peace of mind, knowing that I could easily recover the files if needed.

Reflecting on the AI-Powered Solution

In the end, what could have been a tedious and error-prone manual cleanup became a quick and efficient process, thanks to the combined power of Gemini and Linux. Gemini’s ability to understand my request, process my massive file list, and suggest a solution was remarkable. It felt like having an AI sysadmin by my side, guiding me through the data jungle. This was especially welcome for someone like me who started their career as a Unix sysadmin. Back then, cleanups like this involved hours poring over man pages and carefully crafting bash scripts, especially when deleting files. I even had a friend who accidentally ran rm -r / as root, watching in horror as his system rapidly erased itself. Needless to say, I’ve developed a healthy respect for the destructive power of the command line! In this instance, I would have easily spent an hour writing my careful script to make sure I got it right. But with Gemini, I solved the problem in about 10 minutes and was on my way. This sheer amount of time saved continues to amaze me about these new approaches to AI. More than just solving this immediate problem, this experience opened my eyes to the transformative potential of large language models for data management. Tasks that once seemed impossible or overwhelmingly time-consuming are now within reach, thanks to tools like Gemini.

Conclusion: A Journey of Discovery and Innovation

This experience was a powerful reminder that we’re living in an era of incredible technological advancements. Large language models like Gemini are no longer just fascinating research projects; they are becoming practical tools that can significantly enhance our productivity and efficiency. Gemini’s ability to handle enormous datasets, understand complex requests, and provide actionable solutions is truly game-changing. For me, this project was a perfect marriage of my early Unix sysadmin days and the exciting new world of AI. Gemini’s insights, combined with the precision of Linux commands, allowed me to quickly and safely solve a data problem that would have otherwise cost me significant time and effort.

This is just the first in an occasional series where I’ll be exploring the ways I’m using large models in my everyday work and hobbies. I’m eager to hear from you, my readers! How are you using AI to make your life easier? What would you like to be able to do with AI that you can’t do today? Share your thoughts and ideas in the comments below – let’s learn from each other and build the future of AI together!

Building My Homelab: The Journey from Gemma on a Laptop to a Rack Mounted Powerhouse

September 1, 2024September 8, 2024 Allen Hutchison4 Comments

In the ever-evolving landscape of AI, there are moments when new technologies capture your imagination and set you on a path of exploration and innovation. For me, one of those moments came with the release of the Gemma models. These models, with their promise of enhanced capabilities and local deployment, ignited my curiosity and pushed me to take a significant step in my homelab journey—building a system powerful enough to run these AI models locally.

The Allure of Local AI

I’ve spent the better part of 30 years immersed in the world of machine learning and artificial intelligence. My journey began in the 90s when I was an AI major in the cognitive science program at Indiana University. Back then, AI was a field full of promise, but the tools and technologies we take for granted today were still in their infancy. Fast forward a few decades, and I found myself at Google Maps, leading teams that used machine learning to transform raw imagery into structured data, laying the groundwork for many of the services we rely on daily.

By 2021, I had transitioned to the Core ML group at Google, where my focus shifted to the nuts and bolts of AI—low-level ML infrastructure like XLA, ML runtimes, and performance optimization. The challenges were immense, but so were the opportunities to push the boundaries of what AI could do. Today, as the leader of the AI Developer team at Google, I work with some of the brightest minds in the industry, building systems and technologies that empower developers to use AI in solving meaningful, real-world problems.

Despite all these experiences, the release of the Gemma models reignited a spark in me—a reminder of the excitement I felt as a student, eager to experiment and explore the limits of AI. These models offered something unique: the ability to run sophisticated AI directly on local hardware. For someone like me, who has always believed in the power of experimentation, this was an opportunity too good to pass up.

However, I quickly realized that while I could run these models on my Mac at home, I wanted something more—something that could serve as a shared resource for my family, a system that would be plugged in and available all the time. I envisioned a platform that not only supported these AI models but also provided the flexibility to build and explore other projects. To fully engage with this new wave of AI and create a hub for ongoing experimentation, I needed a machine that could handle the load and grow with our ambitions.

That’s when I decided to take the plunge and build a powerful homelab. I started by carefully spec’ing out the components, aiming to create a system that wasn’t just about raw power but also about versatility and future-proofing. Eventually, I turned to Steiger Dynamics to bring my vision to life. Their expertise in crafting high-performance, custom-built systems made them the perfect partner for this project. But before diving into the specifics of the build, let me share why the concept of local AI holds such a special allure for someone who has been in this field for as long as I have.

Spec’ing Out the Perfect Homelab

Building a homelab is both a science and an art. It’s about balancing performance with practicality, ensuring that every component serves a purpose while also leaving room for future expansion. With the goal of creating a platform capable of handling advanced AI models like Gemma, as well as other projects that might come along, I began the process of selecting the right hardware.

The Heart of the System: CPU and GPU

At the core of any powerful AI system are the CPU and GPU. After researching various options, I decided to go with the AMD Ryzen 9 7900X3D, a 12-core, 24-thread processor that offers the multithreaded performance necessary for AI workloads while still being efficient enough for a range of homelab tasks. But the real workhorse of this system would be the NVIDIA GeForce RTX 4090. This GPU, with its 24 GB of VRAM and immense processing power, was selected to handle the computational demands of AI training, simulations, and real-time applications.

The RTX 4090 wasn’t just about raw power; it was about flexibility. This GPU allows me to experiment with larger datasets, more complex models, and even real-time AI applications. Whether I’m working on image recognition, natural language processing, or generative AI, the RTX 4090 is more than capable of handling the task.

Memory and Storage: Speed and Capacity

To complement the CPU and GPU, I knew I needed ample memory and fast storage. I opted for 128GB of DDR5 5600 MT/s RAM to ensure that the system could handle multiple tasks simultaneously without bottlenecks. This is particularly important when working with large datasets or running several virtual machines at once—a common scenario in a versatile homelab environment.

For storage, I selected two 4 TB Samsung 990 PRO Gen4 NVMe SSDs. These drives provide the speed needed for active projects, with read and write speeds of 7,450 and 6,900 MB/s, respectively, ensuring quick access to data and fast boot times. The choice of separate drives rather than a RAID configuration allows me to manage my data more flexibly, adapting to different projects as needed.

Cooling and Power: Reliability and Efficiency

Given the power-hungry components, proper cooling and a reliable power supply were non-negotiable. I chose a Quiet 360mm AIO CPU Liquid Cooling system, equipped with six temperature-controlled, pressure-optimized 120mm fans in a push/pull configuration. This setup ensures that temperatures remain in check, even during prolonged AI training sessions that can generate significant heat.

The power supply is a 1600 Watt Platinum unit with a semi-passive fan that remains silent during idle periods and stays quiet under load. This ensures stable power delivery to all components, providing the reliability needed for a system that will be running almost constantly.

Building for the Future

Finally, I wanted to ensure that this homelab wasn’t just a short-term solution but a platform that could grow with my needs. The ASUS ProArt X670E-Creator Wifi motherboard I selected provides ample expansion slots, including dual PCIe 8x lanes, which are perfect for future upgrades, whether that means adding more GPUs or expanding storage. With 10G Ethernet and Wi-Fi 6E, this system is also well-equipped for high-speed networking, both wired and wireless.

Throughout this process, my choices were heavily influenced by this Network Chuck video. His insights into building a system for local AI, particularly the importance of choosing the right balance of power and flexibility, resonated with my own goals. Watching his approach to hosting AI models locally helped solidify my decisions around components and made me confident that I was on the right track.

With all these components selected, I turned to Steiger Dynamics to assemble the system. Their expertise in custom builds meant that I didn’t have to worry about the finer details of putting everything together; I could focus on what mattered most—getting the system up and running so I could start experimenting.

Bringing the System to Life: Initial Setup and First Experiments

Once the system arrived, I was eager to get everything up and running. Unboxing the hardware was an exciting moment—seeing all the components I had carefully selected come together in a beautifully engineered machine was incredibly satisfying. But as any tech enthusiast knows, the real magic happens when you power on the system for the first time.

Setting Up Proxmox and Virtualized Environments

For this build, I chose to run Proxmox as the primary operating system. Proxmox is a powerful open-source virtualization platform that allows me to create and manage multiple virtual machines (VMs) on a single physical server. This choice provided the flexibility to run different operating systems side by side, making the most of the system’s powerful hardware.

To streamline the setup process, I utilized some excellent Proxmox helper scripts available on GitHub. These scripts made it easier to configure and manage my virtual environments, saving me time and ensuring that everything was optimized for performance right from the start.

The first VM I set up was Ubuntu 22.04 LTS, which would serve as the main environment for AI development. Ubuntu’s long-term support and robust package management make it an ideal choice for a homelab focused on AI and development. The installation process within Proxmox was smooth, and soon I had a fully functional virtual environment ready for configuration.

I started by installing the necessary drivers and updates, ensuring that the NVIDIA RTX 4090 and other components were operating at peak performance. The combination of the AMD Ryzen 9 7900X3D CPU and the RTX 4090 GPU provided a seamless experience, handling everything I threw at it with ease. With the virtualized Ubuntu environment fully updated and configured, it was time to dive into my first experiments.

Running the First AI Models

With the system ready, I turned my attention to running AI models locally using Ollama as the model management system. Ollama provided an intuitive way to manage and deploy models on my new setup, ensuring that I could easily switch between different models and configurations depending on the project at hand.

The first model I downloaded was the 24B Gemma model. The process was straightforward, thanks to the ample power and storage provided by the new setup. The RTX 4090 handled the model with impressive speed, allowing me to explore its capabilities in real-time. I could experiment with different parameters, tweak the model, and see the results almost instantaneously.

Exploring Practical Applications: Unlocking the Potential of My Homelab

With the system fully operational and the Gemma model successfully deployed, I began exploring the practical applications of my new homelab. The flexibility and power of this setup meant that the possibilities were virtually endless, and I was eager to dive into projects that could take full advantage of the capabilities I now had at my disposal.

Podcast Archive Project

One of the key projects I’ve been focusing on is my podcast archive project. With the large Gemma model running locally, I’ve been able to experiment with using AI to transcribe, analyze, and categorize vast amounts of podcast content. The speed and efficiency of the RTX 4090 have transformed what used to be a time-consuming process into something I can manage seamlessly within my homelab environment.

The ability to run complex models locally has also allowed me to iterate rapidly on how I approach the organization and retrieval of podcast data. I’ve been experimenting with different methods for tagging and indexing content, making it easier to search and interact with large archives. This project has been particularly rewarding, as it combines my love of podcasts with the cutting-edge capabilities of AI.

General Conversational Interfaces

I’ve been exploring is setting up general conversational interfaces. With the Gemma model’s conversational abilities, I’ve been able to create clients that facilitate rich, interactive dialogues. Whether for casual conversation, answering questions, or exploring specific topics, these interfaces have proven to be incredibly versatile.

Getting the models up and running with these clients was a straightforward process, and I’ve been experimenting with different use cases—everything from personal assistants to educational tools. The flexibility of the Gemma model allows for a wide range of conversational applications, making this an area ripe for further exploration.

Expanding the Homelab’s Capabilities

While I’m already taking full advantage of the system’s current capabilities, I’m constantly thinking about ways to expand and optimize the homelab further. Whether it’s adding more storage, integrating additional GPUs for even greater computational power, or exploring new software platforms that can leverage the hardware, the possibilities are exciting.

The Journey Continues

This is just the beginning of my exploration into what this powerful homelab can do. With the hardware now in place, I’m eager to dive into a myriad of projects, from refining my podcast archive system to pushing the boundaries of conversational AI. The possibilities are endless, and the excitement of discovering new applications and optimizing workflows keeps me motivated.

As I continue to explore and experiment, I’ll be sharing my experiences, insights, and challenges along the way. There’s a lot more to come, and I’m excited to see where this journey takes me. I invite you, my readers, to come along for the ride—whether you’re building your own homelab, curious about AI, or just interested in technology. Together, we’ll see just how far we can push the boundaries of what’s possible with this incredible setup.

Packing the Antenna

What It Actually Did Out There

What It Doesn’t Do

An Engineer on Vacation

Would I Recommend It

Share this:

Like this:

Share this:

Like this:

The Power of Small Tools, Amplified by AI

A Truly Contextual and Collaborative Partner

Built by Gemini CLI, For Everyone

Beyond Today: Extensibility

Get Started Today

Share this:

Like this:

Share this:

Like this:

What Makes a Good Transcription?

Using Gemini for Transcription: A First Attempt

Using Whisper for High-Quality AI Transcription

Comparing Transcription Methods and Moving Forward

Final Thoughts

Share this:

Like this:

Gemini: The AI That Can Handle ALL My Data

From AI Insights to Linux Action

Reflecting on the AI-Powered Solution

Conclusion: A Journey of Discovery and Innovation

Share this:

Like this:

The Allure of Local AI

Spec’ing Out the Perfect Homelab

The Heart of the System: CPU and GPU

Memory and Storage: Speed and Capacity

Cooling and Power: Reliability and Efficiency

Building for the Future

Bringing the System to Life: Initial Setup and First Experiments

Setting Up Proxmox and Virtualized Environments

Running the First AI Models

Exploring Practical Applications: Unlocking the Potential of My Homelab

Podcast Archive Project

General Conversational Interfaces

Expanding the Homelab’s Capabilities

Share this:

Like this: