Category: Hadoop Ecosystem


You’ve read Brian Graf’s blog before right? No? Well then, don’t bother reading the rest of this post and head over there right now! Seriously go, go now.

I met Brian while living in Utah and working for Adobe a few years ago. When it comes to PowerCLI, he is one of the best out there and now he is a Senior Product Manager for VMware DRS. He talks at VMUG conferences and VMworld each year sharing his considerable knowledge, and I’ve been fortunate to be able to rely on him when needing help working on PowerCLI scripts.

Oh and he co-wrote a PowerCLI book, VMware vSphere PowerCLI Reference: Automating vSphere Administration 2nd Edition.

I highly recommend his blog become a frequent stop for any vSphere administrator and follow him on Twitter too.

Read More



Let’s be honest, debugging error messages generated by VMware Big Data Extensions (BDE) can be painstaking, tedious and tiresome. Having recently begun to rely on VMware Log Insight more and more, I determined the deployment of BDE v2.3.1 would leverage the Log Insight agent. During the actual vApp deployment of BDE, you may have noticed an option to specify a remote syslog server. I chose to leave the option blank during the deployment, instead choosing to install and configure the Log Insight agent post-installation.

The Linux Log Insight agent can be downloaded on the Administration->Agents screen at the very bottom of the screen.


The download link will include the IP address of the Log Insight server and is used during the RPM deployment of the agent.

Management Server Installation

The BDE management server is currently running CentOS 6.7. After copying the agent to the management server, the following commands can be executed to install and perform the service configuration.

# rpm -ihv VMware-Log-Insight-Agent-3.0.0-2985111.noarch_192.168.1.2.rpm
# chkconfig liagentd --list
# chkconfig liagentd on
# service liagentd restart

After the installation is complete, the BDE management server should appear in the list of servers with installed agents.


The next thing to do is begin editing the /etc/liagent.ini configuration file to send the Serengeti log files to the Log Insight server. Mr. Steve Flanders has an article from 2014 that describes the process of adding custom log files for the agent to parse.

 41 [filelog|syslog]
 42 directory=/opt/serengeti/logs
 43 include=serengeti.log;ironfan.log
 44 event_marker=\[\d{4}-\d{2}-\d{2}T

BDE Template Installation

Having the Log Insight agent installed on the BDE management server is helpful, but I determined having the agent installed natively during the all deployments of Hadoop and Apache Mesos clusters would be even more helpful. There were two options for installing the agent during deployments:

  1. Create a Chef cookbook and include the Log Insight agent RPM file on the management server repo.
  2. Install and configure the agent on the template VM itself.

I opted for option #2, merely because it would be the quickest way initially. Admittedly, using a Chef recipe would have been the better long-term and more “DevOps-y” way to perform the installation. I may reconsider my choice in the future.

Just like the management server installation process, copy the Log Insight agent RPM file onto the template and install it using the same steps. No modifications to the /etc/liagent.ini file are necessary for the VM template node because the Serengeti logs don’t exist.

Be sure to delete the snapshot on the VM template node so the changes take effect.

Now all of the VMs deployed through VMware Big Data Extensions will immediately send log updates to Log Insight and the logs for all of the deployments are now captured as well. Having the logs accessible through Log Insight let you parse them with the power of the Log Insight UI and the filtering capabilities there.


Read More


vmware-sliderVersion 2.3.1 of VMware Big Data Extensions was released on March 29, 2016. The latest version includes the fix for the glibc vulnerability disclosed in February. The current branch saw many new features included back in December when v2.3 was released, including an updated CentOS 6.7 template and support for multiple VM templates within the BDE vApp. The full release notes for the 2.3 branch can be viewed on the VMware site.

I’ve been anxious to upgrade my lab environment to 2.3 for the past several months, however time has been extremely limited due to a heavy workload and family life. Fortunately, the Bay Area experienced a rather rainy weekend and with all the little league baseball games getting cancelled, I was able to sit down and deploy the latest version into my vSphere 6.0 lab.

One of the major improvements that has been made to VMware Big Data Extensions (BDE), is the administrative HTTPS interface running on port 5480. Once the vApp is powered on and you have changed the default random password, point your browser to the interface and login. From there, you will be greeted with a summary screen where you can see the status of the running services. When the BDE management server is initializing, you can monitor the status of the initialization and see any error messages (if they occur).


Clicking the ‘Details…’ link to the right of Initialization Status will load the following pop-up that allows you to watch the progress of the management server.


Once all of the initialization steps complete successfully, the Summary screen can be refreshed and it should show all of the services operational.


At this point, log out and back into the vSphere Web Client to see the Big Data Extensions icon and begin managing the vApp.

The BDE management server is missing two key packages which will prevent a deployment from being successful — mailx and wsdl4j. The BDE documentation includes the following instructions for adding these packages to the management server:

The wsdl4j and mailx RPM packages are not embedded within Big Data Extensions due to licensing agreements. For this reason you must install them within the internal Yum repository of the Serengeti Management Server.

In order to install these packages properly, you will need to execute the following commands on the BDE management server.

# su - serengeti
$ umask 022 
$ cd /opt/serengeti/www/yum/repos/centos/6/base/RPMS/
$ wget
$ wget
$ createrepo ..

After verifying the proper VMFS datastores and Networks are configured within the BDE application, I always like to perform a test deployment of a basic Hadoop cluster. Doing so allows me to be sure everything is working as expected before I begin modifying the BDE management server. A test deployment is also a good way to see if anything in the BDE workflow has changed — it just so happens there is now a nifty new drop-down menu for selecting the VM template that should be used for the deployment.


A successful installation of a basic Hadoop cluster means the VMware Big Data Extensions application is ready for consumption and modification to support the Cloud Native Applications (Marathon, Mesos, Kubernetes, etc) I require in my lab environment.


Read More

The VMware BDE template uses a snapshot to perform the cloning operation as it deploys a cluster. The ability to create a cloned VM from a snapshot is exposed in the vSphere API with the CloneVM_Task. As part of regular template maintenance, I run a yum update command to make sure the OS gets regular updates and security patches. It helps when installing packages like Docker to make sure I’m as close to the stable CentOS 7 branch as possible. However, if you were to simply power on the template and run an OS update those changes would not be realized in new cluster deployments.

If you look at your BDE template, the snapshot the Management server uses can be seen.


By deleting the snapshot, any changes you have made to the template will be used during future cluster deployments. It is not necessary to do anything else. The next cluster deployment, if the template is missing, the BDE framework will create a new one and proceed to use it.

The ability to update the BDE template will assist you in the lifecycle management of your Hadoop, Apache Mesos and all other cluster deployments you are using the VMware Big Data Extensions framework for. Enjoy!

Read More
Posted on


The CentOS 6 template within the VMware Big Data Extensions was becoming  bit long in the tooth and needed to be updated to CentOS 7 for a variety of reasons. As I began to look at using some of the newer features in Docker, it became apparent CentOS 6 was no longer going to be a useful template VM. I tried using the Debian 7 template included as a part of the VMware Fling for BDE released last year, however it had several problems with the Chef recipes. The effort required to get a CentOS 7 template built and working with BDE took a bit of trial-and-error, this post will simplify it for others to get them going in a more timely manner.

The documentation for building an alternate VM template for BDE are a bit outdated, specifically referring to building a CentOS 6 template when it was still using the CentOS 5 branch. I started with those directions and pieced them together with some of the work I had done previously getting Photon to support Apache Mesos. Let’s get started with a base CentOS 7 VM.

CentOS 7 VM Installation

Start by downloading the CentOS 7 minimal ISO file from a local mirror. Once you have ISO go ahead and create a new VM in your vCenter environment. Make sure you provide a 40GB disk drive — I allocated a single vCPU and 1GB of memory to my template VM. Mount the ISO file and power on the VM. Once you’ve gone through the installation, reboot the VM and SSH into it.

The first thing I like to do is make sure the OS is up-to-date, followed by installing VMware Tools.

# yum -y update
# yum -y install open-vm-tools
# cat <<-EOF >> /etc/yum.repos.d/vmware-tools.repo
> [vmware-tools]
> name = VMware Tools
> baseurl =
> enabled = 1
> gpgcheck = 1
# curl -o
# curl -o
# rpm --import
# rpm --import
# yum -y install open-vm-tools-debloypkg
# systemctl restart vmtoolsd
# ECHO localhost.localdomain > /etc/hostname
# shutdown -h now

The above steps will create a basic CentOS 7 VM that can be cloned to a template for use within your environment going forward. Once you are satisfied with the VM, the next step is to install the BDE specific bits for it to function properly.

Configure CentOS 7 Template for BDE

Power on the VM and SSH back into it. The next things to do are to install the JDK and install several customization packages that are provided on the BDE management server. There are a few minor modifications that have to be made in order to get it working.

# systemctl disable firewalld
# systemctl stop firewalld
# mkdir os && cd os
# curl --insecure -o custos.tar.gz https://BDE_MGMT_SERVER/custos/custos.tar.gz
# tar zxf custos.tar.gz
# vim
 48 #reduce grub boot waiting time
 49 #sed -i 's|^timeout=.*$|timeout=0|' /boot/grub/grub.conf
143 #stop firewall
144 #service iptables stop
145 #chkconfig iptables off
# ./

When the installer completes, the terminal screen should look like the following:



One of the packages installed is chef-client. I prefer to run a quick check to make sure the binary is installed properly.

# which chef-client
# sed -i 's/enforcing/disabled/g' /etc/selinux/config /etc/selinux/config
# shutdown -h now

Notice that I turned off selinux as it was interfering with the latest version of Docker when the service was trying to be started.

The template can now be placed in the vApp for BDE and Tomcat on the management server can be restarted to see the new template.

Chef Recipe Modifications

Simply building a new CentOS 7 template VM and throwing it into the vApp is not all that is required. The next steps took me through quite a bit of trial-and-error before I had a cluster deploying properly again. Many of the Chef recipes need to be modified to account for newer package version configuration changes and other configurations performed within the recipes. I had to step through each service role one-by-one and make sure they were all working properly.

Rather than go through each and every recipe included on the BDE Management server, I will merely say there were a myriad of changes and you can download my updated Chef recipes from the GitHub repo for Virtual Elephant.

I would strongly encourage you to take a backup of the entire BDE Management VM, snapshot the VM or create an off-site copy of the /opt/serengeti/chef/cookbooks directory before pulling my changes into your environment.

Once all of the recipes are updated, be sure to run the ‘knife cookbook upload -a’ command on the BDE Management server. Then the template will be fully ready to be utilized within your environment.

Getting a CentOS7 VM template was a necessity for me with some of the work I am doing in my lab environments. The next few posts on the site will be focused around these efforts and they would not have been possible if I had not done this work up-front. When my wife asked what I had been working on for the past few nights, I had to explain to her that I had gotten into a bit of a rabbit-hole and I’ve finally come back out…just to start on the work I wanted to begin several days ago.

Read More