In the ever-evolving landscape of AI, there are moments when new technologies capture your imagination and set you on a path of exploration and innovation. For me, one of those moments came with the release of the Gemma models. These models, with their promise of enhanced capabilities and local deployment, ignited my curiosity and pushed me to take a significant step in my homelab journey—building a system powerful enough to run these AI models locally.
The Allure of Local AI
I’ve spent the better part of 30 years immersed in the world of machine learning and artificial intelligence. My journey began in the 90s when I was an AI major in the cognitive science program at Indiana University. Back then, AI was a field full of promise, but the tools and technologies we take for granted today were still in their infancy. Fast forward a few decades, and I found myself at Google Maps, leading teams that used machine learning to transform raw imagery into structured data, laying the groundwork for many of the services we rely on daily.
By 2021, I had transitioned to the Core ML group at Google, where my focus shifted to the nuts and bolts of AI—low-level ML infrastructure like XLA, ML runtimes, and performance optimization. The challenges were immense, but so were the opportunities to push the boundaries of what AI could do. Today, as the leader of the AI Developer team at Google, I work with some of the brightest minds in the industry, building systems and technologies that empower developers to use AI in solving meaningful, real-world problems.
Despite all these experiences, the release of the Gemma models reignited a spark in me—a reminder of the excitement I felt as a student, eager to experiment and explore the limits of AI. These models offered something unique: the ability to run sophisticated AI directly on local hardware. For someone like me, who has always believed in the power of experimentation, this was an opportunity too good to pass up.
However, I quickly realized that while I could run these models on my Mac at home, I wanted something more—something that could serve as a shared resource for my family, a system that would be plugged in and available all the time. I envisioned a platform that not only supported these AI models but also provided the flexibility to build and explore other projects. To fully engage with this new wave of AI and create a hub for ongoing experimentation, I needed a machine that could handle the load and grow with our ambitions.
That’s when I decided to take the plunge and build a powerful homelab. I started by carefully spec’ing out the components, aiming to create a system that wasn’t just about raw power but also about versatility and future-proofing. Eventually, I turned to Steiger Dynamics to bring my vision to life. Their expertise in crafting high-performance, custom-built systems made them the perfect partner for this project. But before diving into the specifics of the build, let me share why the concept of local AI holds such a special allure for someone who has been in this field for as long as I have.
Spec’ing Out the Perfect Homelab
Building a homelab is both a science and an art. It’s about balancing performance with practicality, ensuring that every component serves a purpose while also leaving room for future expansion. With the goal of creating a platform capable of handling advanced AI models like Gemma, as well as other projects that might come along, I began the process of selecting the right hardware.
The Heart of the System: CPU and GPU
At the core of any powerful AI system are the CPU and GPU. After researching various options, I decided to go with the AMD Ryzen 9 7900X3D, a 12-core, 24-thread processor that offers the multithreaded performance necessary for AI workloads while still being efficient enough for a range of homelab tasks. But the real workhorse of this system would be the NVIDIA GeForce RTX 4090. This GPU, with its 24 GB of VRAM and immense processing power, was selected to handle the computational demands of AI training, simulations, and real-time applications.
The RTX 4090 wasn’t just about raw power; it was about flexibility. This GPU allows me to experiment with larger datasets, more complex models, and even real-time AI applications. Whether I’m working on image recognition, natural language processing, or generative AI, the RTX 4090 is more than capable of handling the task.
Memory and Storage: Speed and Capacity
To complement the CPU and GPU, I knew I needed ample memory and fast storage. I opted for 128GB of DDR5 5600 MT/s RAM to ensure that the system could handle multiple tasks simultaneously without bottlenecks. This is particularly important when working with large datasets or running several virtual machines at once—a common scenario in a versatile homelab environment.
For storage, I selected two 4 TB Samsung 990 PRO Gen4 NVMe SSDs. These drives provide the speed needed for active projects, with read and write speeds of 7,450 and 6,900 MB/s, respectively, ensuring quick access to data and fast boot times. The choice of separate drives rather than a RAID configuration allows me to manage my data more flexibly, adapting to different projects as needed.
Cooling and Power: Reliability and Efficiency
Given the power-hungry components, proper cooling and a reliable power supply were non-negotiable. I chose a Quiet 360mm AIO CPU Liquid Cooling system, equipped with six temperature-controlled, pressure-optimized 120mm fans in a push/pull configuration. This setup ensures that temperatures remain in check, even during prolonged AI training sessions that can generate significant heat.
The power supply is a 1600 Watt Platinum unit with a semi-passive fan that remains silent during idle periods and stays quiet under load. This ensures stable power delivery to all components, providing the reliability needed for a system that will be running almost constantly.
Building for the Future
Finally, I wanted to ensure that this homelab wasn’t just a short-term solution but a platform that could grow with my needs. The ASUS ProArt X670E-Creator Wifi motherboard I selected provides ample expansion slots, including dual PCIe 8x lanes, which are perfect for future upgrades, whether that means adding more GPUs or expanding storage. With 10G Ethernet and Wi-Fi 6E, this system is also well-equipped for high-speed networking, both wired and wireless.
Throughout this process, my choices were heavily influenced by this Network Chuck video. His insights into building a system for local AI, particularly the importance of choosing the right balance of power and flexibility, resonated with my own goals. Watching his approach to hosting AI models locally helped solidify my decisions around components and made me confident that I was on the right track.
With all these components selected, I turned to Steiger Dynamics to assemble the system. Their expertise in custom builds meant that I didn’t have to worry about the finer details of putting everything together; I could focus on what mattered most—getting the system up and running so I could start experimenting.
Bringing the System to Life: Initial Setup and First Experiments
Once the system arrived, I was eager to get everything up and running. Unboxing the hardware was an exciting moment—seeing all the components I had carefully selected come together in a beautifully engineered machine was incredibly satisfying. But as any tech enthusiast knows, the real magic happens when you power on the system for the first time.
Setting Up Proxmox and Virtualized Environments
For this build, I chose to run Proxmox as the primary operating system. Proxmox is a powerful open-source virtualization platform that allows me to create and manage multiple virtual machines (VMs) on a single physical server. This choice provided the flexibility to run different operating systems side by side, making the most of the system’s powerful hardware.
To streamline the setup process, I utilized some excellent Proxmox helper scripts available on GitHub. These scripts made it easier to configure and manage my virtual environments, saving me time and ensuring that everything was optimized for performance right from the start.
The first VM I set up was Ubuntu 22.04 LTS, which would serve as the main environment for AI development. Ubuntu’s long-term support and robust package management make it an ideal choice for a homelab focused on AI and development. The installation process within Proxmox was smooth, and soon I had a fully functional virtual environment ready for configuration.
I started by installing the necessary drivers and updates, ensuring that the NVIDIA RTX 4090 and other components were operating at peak performance. The combination of the AMD Ryzen 9 7900X3D CPU and the RTX 4090 GPU provided a seamless experience, handling everything I threw at it with ease. With the virtualized Ubuntu environment fully updated and configured, it was time to dive into my first experiments.
Running the First AI Models
With the system ready, I turned my attention to running AI models locally using Ollama as the model management system. Ollama provided an intuitive way to manage and deploy models on my new setup, ensuring that I could easily switch between different models and configurations depending on the project at hand.
The first model I downloaded was the 24B Gemma model. The process was straightforward, thanks to the ample power and storage provided by the new setup. The RTX 4090 handled the model with impressive speed, allowing me to explore its capabilities in real-time. I could experiment with different parameters, tweak the model, and see the results almost instantaneously.
Exploring Practical Applications: Unlocking the Potential of My Homelab
With the system fully operational and the Gemma model successfully deployed, I began exploring the practical applications of my new homelab. The flexibility and power of this setup meant that the possibilities were virtually endless, and I was eager to dive into projects that could take full advantage of the capabilities I now had at my disposal.
Podcast Archive Project
One of the key projects I’ve been focusing on is my podcast archive project. With the large Gemma model running locally, I’ve been able to experiment with using AI to transcribe, analyze, and categorize vast amounts of podcast content. The speed and efficiency of the RTX 4090 have transformed what used to be a time-consuming process into something I can manage seamlessly within my homelab environment.
The ability to run complex models locally has also allowed me to iterate rapidly on how I approach the organization and retrieval of podcast data. I’ve been experimenting with different methods for tagging and indexing content, making it easier to search and interact with large archives. This project has been particularly rewarding, as it combines my love of podcasts with the cutting-edge capabilities of AI.
General Conversational Interfaces
I’ve been exploring is setting up general conversational interfaces. With the Gemma model’s conversational abilities, I’ve been able to create clients that facilitate rich, interactive dialogues. Whether for casual conversation, answering questions, or exploring specific topics, these interfaces have proven to be incredibly versatile.
Getting the models up and running with these clients was a straightforward process, and I’ve been experimenting with different use cases—everything from personal assistants to educational tools. The flexibility of the Gemma model allows for a wide range of conversational applications, making this an area ripe for further exploration.
Expanding the Homelab’s Capabilities
While I’m already taking full advantage of the system’s current capabilities, I’m constantly thinking about ways to expand and optimize the homelab further. Whether it’s adding more storage, integrating additional GPUs for even greater computational power, or exploring new software platforms that can leverage the hardware, the possibilities are exciting.
The Journey Continues
This is just the beginning of my exploration into what this powerful homelab can do. With the hardware now in place, I’m eager to dive into a myriad of projects, from refining my podcast archive system to pushing the boundaries of conversational AI. The possibilities are endless, and the excitement of discovering new applications and optimizing workflows keeps me motivated.
As I continue to explore and experiment, I’ll be sharing my experiences, insights, and challenges along the way. There’s a lot more to come, and I’m excited to see where this journey takes me. I invite you, my readers, to come along for the ride—whether you’re building your own homelab, curious about AI, or just interested in technology. Together, we’ll see just how far we can push the boundaries of what’s possible with this incredible setup.
Awesome project, I’ve decided to build the same hardware setup for my own AI homelab. Just a couple questions:
– Proxmox and the RTX 4090, was that complicated? Did you follow a specific guide tot get it working? I’ve seen several ways to do this but all seem complicated.
– ASUS ProArt X670E-Creator Wifi motherboard: After looking into this motherboard, I’ve learned that it is not meant to run as a headless server, and unless you plugin a monitor, the white VGA status led remains on. Did this cause any issues and/or is there a way around it?
I spent a lot of time going through various tutorials and old forum posts on pass through with Proxmox. It turned out the best guide was the proxmox wiki: https://pve.proxmox.com/wiki/PCI_Passthrough I used these instructions and it worked great.
On the motherboard I haven’t had any issues running it in a headless configuration. In my case it’s in a rackmount case in a closet, and has been fine.