Monitoring vSAN is an important aspect of any operational team leveraging HCI within their environments. As such, there are multiple tools out there to provide detailed metrics, graphs, and reports that a team can choose from. Which one is right for you is a questions you’ll need to answer, based on your technical and business requirements. I want to highlight today one such tool that you could consider – the vSAN Performance Monitor Fling.

From the Flings website it states:

The vSAN performance monitor is a monitoring and visualization tool based on vSAN Performance metrics. It will collect vSAN Performance and other metrics periodically from the clusters configured. The data collected
is visualized in a more efficient and user-friendly way.

It is the visualization of the data that I have found most useful within vSAN environments. When it becomes necessary to troubleshoot a vSAN performance or potential bottleneck situation, the visualization of the data can quickly point to a segment of the vSAN stack that is the root cause.

The vSAN Performance Monitor Fling is a single virtual appliance that can be deployed anywhere within the VMware SDDC environment. It leverages a combination of Telegraf for data collection, InfluxDB for storing the metrics and Grafana for visualization of the metrics.

Installation Steps

Note: I found the User Guide on the Flings site to be rather rubbish and had to make several adjustments based on the lack of information.

After downloading the OVA file from the Flings website, begin deploying it through the standard methods. The VM itself has a rather small footprint, requiring only 1 vCPU, 2GB memory and 16GB of disk space.

When I first powered on the VM, I got the error The guest operating system 'vmwarePhoton64Guest' is not supported. To resolve this issue, edit the VM settings and select VM Options --> General Options --> Guest OS --> Linux and Other Linux (64-bit).

The current version of the Fling (v1.2) does not correctly set the password to the one specified during the OVA deployment. In order to log into the VM, it is necessary to reset the Photon OS password. There is a good blog article on that explains the process.

1. At the Photon OS screen, press 'e' to edit GRUB.
2. Append 'rw init=/bin/bash' to the linux boot line.
3. Press CTRL-x or F10 to resume boot process.

At the BASH prompt, perform the following steps:

# mount -o remount,rw /
# passwd
# /sbin/pam_tally2 -r -u root
# umount /
# reboot -f

The other annoyance to the current OVA is the deployment does not allow for configuration of the NIC. It currently relies on a DHCP server to assign an IP address to the VM. If you need to configure a static IP address, you will want to follow the following guide – Setting a Static IP Address.

The VM should be ready for configuration now.

Configuration Steps

After logging into the appliance via ssh, edit the /root/telegraf.conf file. I modified lines 5-7 and line 10 based on my environment and the User Guide information.

From there the VM is ready for the services, via Docker, to be started.

Once the Docker containers are running, the Grafana UI is accessible on http://<IP_ADDRESS>:3000

From there, the pre-built dashboards are accessible via the Dashboards -> Manage submenu.

Data collection will have begun immediately upon starting the Docker containers.

vSAN Performance Insight

After the appliance has gathered the metrics for some period of time, you can log into Grafana and begin to leverage the insight provided.

Each mini-graph can be selected and viewed in a larger format to better understand the data. As you can see in the following image, the metrics allow us to drill down into the cache-disk device latency values.

This appliance is a great tool when you need to troubleshoot an issue inside a vSAN environment. Feel free to deploy it and use it within your environment to see how it works and if it will be useful to you (or your organization).