Scaling down the home lab, slightly

Since my last post about my new Zion 4U storage server I have been more and more impressed by Unraid and have started using Docker and KVM virtualization on it.

Since then I have moved some VMs over to KVM virtualization on Unraid. That left my Domain Controllers, IPAM, and VCSA left on my VMware environment. My secondary DC was also broken so I was running with just a single DC. This made me re-think how the home lab was configured and decided to simplify some of the many moving parts.

The main hassle is recovering from a power outage. Typically I have to manually intervene to get most things back online. Having systems more consolidated will allow a faster and more hands-off recovery.

Continue reading…

vSphere 6.7 U1 now released

On October 17, 2018 VMware announced that vSphere 6.7 Update 1 is now available. The new HTML5 client is now ‘Fully Featured’ which means that you can use the HTML5 client for all administration and configuration of vSphere; including Auto Deploy, Host Profiles, VMware vSphere Update Manager (VUM), vCenter High Availability (VCHA), network topology diagrams, overview performance charts, and more.

I am personally excited to see the HTML5 client become the primary client as I much prefer using it over the flash client. One of the more interesting features included in this release is the vCenter External to Embedded Convergence tool. Since embedded PSC is the recommended deployment model for vCenter Server this tool allows you to migrate to an embedded PSC without having to nuke-and-pave your entire vCenter installation.

The Content Library also got some much needed love from the VMware development team as it now supports two more new file formats; allowing templates and OVA files. This makes the Content Library much more functional. The lack of VM templates was a major caveat of the Content Library to the point of making it practically useless for some VMware customers. So this change is a welcome one to say the least.

New Features

  • vCenter High Availability (VCHA)
    • We redesigned VCHA workflows to combine the Basic and Advanced configuration workflows. This streamlines the user experience and eliminates the need for manual intervention of some deployments.
  • Search Experience
    • We revamped the search experience. In this version of the vSphere Client, you can now search for objects with a string and filter the search results based on Tags/Custom attributes. You can also filter the object lists in the search even further. For instance, you can filter on the power state of the VMs etc., You can save your searches and revisit them later.
  • Performance Charts
    • You can pop the performance charts into a separate tab and zoom in on a specific time in the chart. We also added overview performance charts for datacenters and clusters.
  • Dark Theme
    • Dark theme has been one of the most requested features for the vSphere Client so we’re introducing a Dark mode setting. Support for the Dark theme is available for all core vSphere Client functionality and implementation for vSphere Client plugins is in progress.
  • Alarm Definitions
    • We greatly simplified the way you define new alarms, particularly in how you create rules for trigger conditions.

VMUG Advantage is 10% off

Today VMware announced that VMUG Advantage is 10% off until December 31st, 2017.

This is the best way to get VMware licenses for your home lab environment. This is included with VMUG Advantage membership through EVALExperience which gives you a 365-day evaluation license for personal use in a non-production environment. Continue reading…

Disaster strikes as NAS3 crashes

This past weekend we had a power brownout for about 4 hours. This caused my servers to fail-over to battery power. The batteries don’t last long with servers running. I guess something went sour with the automatic shutdown of my NAS3 which is used only for my VMware virtual machines and it did an improper shutdown. The RAID has crashed.

I don’t have anyone to blame other than myself and I knew eventually this day would come. NAS3 was in RAID-0. That means striping with no redundancy. A failed array on RAID-0 typically means total data loss. I take daily backups of this entire NAS nightly so I am aware and prepared for the risk of using striping. That does not mean that it’s a fun time recovering from it.

Adding additional redundancy for blackouts

Currently, one of the hardest things to recover from in my current home-lab environment is a total power blackout. Everything right now is planned & designed around losing certain components like 1 disk, 1 switch/network cable, etc. However when everything is off and I need to bring things back online it’s a painstaking and very manual process. Over time my environment has also become more and more complex. This latest outage has me scratching my head at how to recover faster & simpler from a power blackout.

Continue reading…

vSAN all hosts down scenario

 

The worst case scenario in a VMware vSAN cluster is all hosts down. A situation where no sysadmin wants to find themselves in. Panic & frustration quickly follow suit. Despite all the safety features built into vSAN it is designed to tolerate failures within it’s failure domains, not an entire vSAN cluster outage.

Scenario

Unsaid client was in the process of setting up a VDS on an existing VSAN cluster. Mistakenly selected the vSAN vmkernel adapters on all hosts for migration to the VDS while the cluster was in operation. Upon deploying this change it instantly took down the entire 4-node, 14TB vSAN cluster. All VMs down, vSAN data store showing as 0KB. To add to the mix, the customers vCenter VCSA was also down because it was also hosted on the vSAN which made it even more difficult to view the overall health of the environment.

  • vSphere 6.5 environment
  • vSAN total failure, non-stretched, single host failure domains
  • All vSAN VMs down including vCenter VCSA
  • 4-node cluster vSAN
  • Hybrid disk groups (1 flash, 2 HDD per host)
  • NumberOfFailuresToTolerate=1

Disaster Recovery

This is a cluster network total failure. This results in a complete network partition of vSAN where each host will reside in its own partition. To each isolated host, it will look like all the other hosts have failed. Since no quorum can be achieved for any object, no rebuilding takes place. Once the network issue is resolved vSAN will try to establish a new cluster and components will start to resync. Components are synchronized against the latest, most up to date copy of a component.

Continue reading…

Migration from Cisco 1000v to VMware Virtual Distributed Switch (Part 2)

home_network3

This is part 2 of a series. Click here to see Part 1. I apologise for taking so long to get Part 2 posted. Sometimes I just don’t have the time or effort I would like to have with the blog.

000193_2015-10-29 10_06

This portion of the guide focuses on the second half of the VSS to VDS migrations. We needed to move the VMs to a VSS so that you can migrate both VMs and hosts to the new vCenter cleanly. Then we will be moving the VMs back to a VDS from their VSS configuration.

Keep in mind this migration is being done LIVE with production virtual machines running on the hosts. Obviously, this must be executed carefully or you will have a lot of explaining to do. Do not make these changes without understanding the full impact to your environment. Continue reading…