Home Lab Rebuild
It’s been long overdue for some changes to my home lab. The latest full outage on Sept 4, 2017 due to a power brown-out had me realizing that some improvements can be made. There has not been any major changes to the lab since 2015. In 2016 I upgraded the storage in NAS1, memory upgrade for VMH02, added Ubiquiti UAP-AC-LITE access points, and a security camera.
Now I’m going back to the drawing board and doing a fresh rebuild. The goal this time around is to be simple and redundant.
- Hardware firewall: I have custom built a 1U Supermicro server that will be used as the new firewall. It has a Intel Xeon X3470 CPU, 8GB RAM, quad gigabit LAN ports and a 200W low power supply. I’ve also replaced the stock passive CPU heat-sink with the Thermaltake Engine 27 low profile heat-sink. It’s a well balanced combination of performance, power and noise. In the old lab design the virtualized firewall introduced too many dependencies and greatly increased the complexity of the network. During a power outage scenario it also requires me to have a VM host and storage online which does not last long on UPS batteries. Having a low power hardware firewall allows me more flexibility and faster recovery from a total lab black-out.
- Additional UPS backup power: There will now be a third UPS battery for the home lab. I will dedicate one UPS for the core networking equipment and try to keep the load on it under 25% to maximize the battery life. The rest of the gear will be balanced over the other two UPS batteries.
- Standard Virtual Switches: I will be removing the Virtual Distributed Switch and LACP on the ESXi hosts. This is a tough call but I have weighed the options. The VDS in my environment is overkill. I have two hosts, with only one of them on at a time. In my scenario the VDS’s only purpose is configuration sync. I don’t use traffic shaping, private VLANs, LLDP, etc! The only loss I will take by moving down to a VSS is having to manually maintain the port groups exactly the same on each host and no LACP. That doesn’t concern me because that hardly ever changes.
The lab was torn down to the bones on Sunday, September 17, 2017. It took me about 4 hours to get everything back online. I took my time and did some cleaning along the way. I also took the opportunity while everything was offline to color code the power cables & devices depending on which UPS they would be plugged into. This way I can quickly & easily identify which equipment is plugged into which UPS.
I had just enough room for the additional UPS on the bottom row of my rack along side the NAS devices. For the newly added UPS I included a little emergency planning. I have it’s power feed coming from a heavy duty 50ft extension cord. The bundle of excess cable is nicely tucked away, but should I ever have a power issue in the server room then I can re-route the power feed as needed. After I had the power & cabling situation under control I brought everything back online.
The firewall was the next step in the process. Installed the OS and re-imported the configuration from my virtual firewall. Very painless… except for the part where it wasn’t detecting the RAID controller and I accidentally overwrote the install media. Whoops. The firewall, core switch, WiFi and modem all run on the one UPS at just under 100 watts.
I’ve setup triple redundancy for DHCP by leveraging a Windows DHCP hot-standby fail-over cluster and the firewall using dnsmasq with a “dhcp-reply-delay” configured so that it will only offer DHCP in the situation where my VMs (Windows DHCP) are down. I have added the firewall as a tertiary DNS server in the DHCP options on the Windows DHCP side for added DNS stability since I use 30 day leases. All NAS servers have been reconfigured to shutdown gracefully at 60% UPS battery, instead of 15%. Overall I should be in a much better position than before all this work.
Have any questions or suggestions? Let me know in the comments below!