CentOS 7 Template for VMware Big Data Extensions


The CentOS 6 template within the VMware Big Data Extensions was becoming  bit long in the tooth and needed to be updated to CentOS 7 for a variety of reasons. As I began to look at using some of the newer features in Docker, it became apparent CentOS 6 was no longer going to be a useful template VM. I tried using the Debian 7 template included as a part of the VMware Fling for BDE released last year, however it had several problems with the Chef recipes. The effort required to get a CentOS 7 template built and working with BDE took a bit of trial-and-error, this post will simplify it for others to get them going in a more timely manner.

The documentation for building an alternate VM template for BDE are a bit outdated, specifically referring to building a CentOS 6 template when it was still using the CentOS 5 branch. I started with those directions and pieced them together with some of the work I had done previously getting Photon to support Apache Mesos. Let’s get started with a base CentOS 7 VM.

CentOS 7 VM Installation

Start by downloading the CentOS 7 minimal ISO file from a local mirror. Once you have ISO go ahead and create a new VM in your vCenter environment. Make sure you provide a 40GB disk drive — I allocated a single vCPU and 1GB of memory to my template VM. Mount the ISO file and power on the VM. Once you’ve gone through the installation, reboot the VM and SSH into it.

The first thing I like to do is make sure the OS is up-to-date, followed by installing VMware Tools.

# yum -y update
# yum -y install open-vm-tools
# cat <<-EOF >> /etc/yum.repos.d/vmware-tools.repo
> [vmware-tools]
> name = VMware Tools
> baseurl = http://packages.vmware.com/packages/rhel7/x86_64/
> enabled = 1
> gpgcheck = 1
# curl -o vmware-dsa.pub http://packages.vmware.com/tools/keys/VMWARE-PACKAGING-GPG-DSA-KEY.pub
# curl -o vmware-rsa.pub http://packages.vmware.com/tools/keys/VMWARE-PACKAGING-GPG-RSA-KEY.pub
# rpm --import vmware-dsa.pub
# rpm --import vmware-rsa.pub
# yum -y install open-vm-tools-debloypkg
# systemctl restart vmtoolsd
# ECHO localhost.localdomain > /etc/hostname
# shutdown -h now

The above steps will create a basic CentOS 7 VM that can be cloned to a template for use within your environment going forward. Once you are satisfied with the VM, the next step is to install the BDE specific bits for it to function properly.

Configure CentOS 7 Template for BDE

Power on the VM and SSH back into it. The next things to do are to install the JDK and install several customization packages that are provided on the BDE management server. There are a few minor modifications that have to be made in order to get it working.

# systemctl disable firewalld
# systemctl stop firewalld
# mkdir os && cd os
# curl --insecure -o custos.tar.gz https://BDE_MGMT_SERVER/custos/custos.tar.gz
# tar zxf custos.tar.gz
# vim installer.sh
 48 #reduce grub boot waiting time
 49 #sed -i 's|^timeout=.*$|timeout=0|' /boot/grub/grub.conf
143 #stop firewall
144 #service iptables stop
145 #chkconfig iptables off
# ./installer.sh

When the installer completes, the terminal screen should look like the following:



One of the packages installed is chef-client. I prefer to run a quick check to make sure the binary is installed properly.

# which chef-client
# sed -i 's/enforcing/disabled/g' /etc/selinux/config /etc/selinux/config
# shutdown -h now

Notice that I turned off selinux as it was interfering with the latest version of Docker when the service was trying to be started.

The template can now be placed in the vApp for BDE and Tomcat on the management server can be restarted to see the new template.

Chef Recipe Modifications

Simply building a new CentOS 7 template VM and throwing it into the vApp is not all that is required. The next steps took me through quite a bit of trial-and-error before I had a cluster deploying properly again. Many of the Chef recipes need to be modified to account for newer package version configuration changes and other configurations performed within the recipes. I had to step through each service role one-by-one and make sure they were all working properly.

Rather than go through each and every recipe included on the BDE Management server, I will merely say there were a myriad of changes and you can download my updated Chef recipes from the GitHub repo for Virtual Elephant.

I would strongly encourage you to take a backup of the entire BDE Management VM, snapshot the VM or create an off-site copy of the /opt/serengeti/chef/cookbooks directory before pulling my changes into your environment.

Once all of the recipes are updated, be sure to run the ‘knife cookbook upload -a’ command on the BDE Management server. Then the template will be fully ready to be utilized within your environment.

Getting a CentOS7 VM template was a necessity for me with some of the work I am doing in my lab environments. The next few posts on the site will be focused around these efforts and they would not have been possible if I had not done this work up-front. When my wife asked what I had been working on for the past few nights, I had to explain to her that I had gotten into a bit of a rabbit-hole and I’ve finally come back out…just to start on the work I wanted to begin several days ago.

Using VMware Instant Clone with Big Data Extensions


Over the past weekend, I wiped my home lab environment that was running vSphere 5.5 and installed a fresh set of vSphere 6.0U1 bits. The decision to begin using vSphere 6.0 in the home lab was largely due to now longer needing vSphere 5.5 for my VCAP-DCA studying and wanting to finally begin using the Instant Clone technology for deploying Hadoop clusters with Big Data Extensions. As I had mentioned in an earlier post, the release of version 2.2 for Big Data Extensions exposed the ability to use VMware Instant Clone technology. It did not however enable the setting by default, leaving it to the vSphere administrator to determine whether or not to enable it.

Turning on the feature is simple enough. Edit the /opt/serengeti/conf/serengeti.properties file on the BDE management server, changing line 104 to state instant for the cluster.clone.service variable. The exact syntax is highlighted in the screenshot below.


After saving the file and restarting the Tomcat service, the environment is ready to go!

For those unfamiliar with the VMware Instant Clone technology, a brief description from the VMware Blog states,

The Instant Clone capability allows admins to ‘fork’ a running virtual machine, meaning, it is not a full clone. The parent virtual machine is brought to a state where the admin will Instant Clone it, at which time the parent virtual machine is quiesced and placed in a state of a ‘parent VM’. This allows the admins to create as many “child VMs” as they please. These child VMs are created in mere seconds (or less) depending on the environment (I’ve seen child VMs created in .6 seconds). The reason these child VMs can be created so quickly is because they leverage the memory and disk of the parent VM. Once the child VM is created, any writes are placed in delta disks.

When the parent virtual machine is quiesced, a prequiesce script cleans up certain aspects of the parent VM while placing it in its parent state, allowing the child VMs to receive unique MAC addresses, UUID, and other information when they are instantiated. When spinning up the child VMs a post clone script can be used to set properties such as the network information of the VM, and/or kick off additional scripts or actions within the child VM.

The ability to deploy new Hadoop VMs in an extremely quick manner through Big Data Extensions is amazing! In addition, because of the extensibility of BDE, the VMware Instant Clone feature is used when any type of cluster deployment is initiated — Apache Spark, Apache Mesos, Apache Hadoop, etc.

VMware Instant Clone VMs

The new cloned VMs launched in less than 1 minute — for my Intel NUC home lab it was really impressive! I’ve read all the press stating Instant Clones would launch in single-digit seconds, but you never know how something is going to work in an environment you control. Seeing is believing!

The interesting bit I did not anticipate was the fact that a single parent VM was spun up on each ESXi host inside my home lab when the first Hadoop cluster was deployed. You can see the parent VMs in a new resource pool created during the deployment that is separate from the Hadoop cluster resource pool.


As a result of the additional VMs that did not exist when the original cluster was deployed without VMware Instant Clone, the total resource utilization across the cluster actually increased.

Cluster Utilization Before VMware Instant Clone



Cluster Utilization After VMware Instant Clone



Increased Hadoop Cluster Utilization with VMware Instant Clone

The next step was to increase the cluster size to see what sort of savings could be realized through the Instant Clone technology. I increased the number of worker nodes through BDE using the new interface.


Note: The new adjustment field and showing the new node count is outstanding!

After the new nodes were added, the total cluster utilization looked like there were some savings.



I am looking forward to using the new VMware Instant Clone technology further in the lab — including Photon nodes — to see what further savings I can get out of my home lab.

Custom Photon TP2 + Mesos 0.23.0 ISO Download


As noted in the previous post, I had to custom build an ISO file that included Mesos 0.23.0 in order for the nodes to join the Mesos cluster I had deployed using Big Data Extensions. There was no forking of the Photon OS code off of GitHub required, it was just a matter of building the ISO.

In order to save others time, I have made my build of Photon available on GitHub for downloading. You can download the ISO file here.

The updated SPEC file for Mesos on the official Photon GitHub repo:



Exposing Apache Mesos on VMware Big Data Extensions v2.2

The VMware Big Data Extensions v2.2 release included the cookbooks for Apache Mesos and Kubernetes from the Fling released this past spring. However, those cookbooks are not exposed when you deploy the new version. Fortunately, unlocking them only takes a few minutes before they can be made available! I will cover exactly what is needed in order to begin using these Cloud Native App cluster deployments below.

If you jump onto your v2.2 management server and look in the /opt/serengeti/chef/cookbooks directory, you will see all of the Cloud Native App additions.


A quick look to be sure the Chef roles are still defined tells us that they are.



They even did us the favor of including the JSON spec files in the /opt/serengeti/www/specs/Ironfan directory.


The missing pieces are the entries in the /opt/serengeti/www/specs/map and /opt/serengeti/www/distros/manifest files. Those are rather easy to copy out of the VMware Fling itself or re-create manually. If you want to edit the files yourself, here is what needs to be added to the files.


  "vendor" : "Kubernetes",
  "version" : "^(\\d)+(\\.\\w+)*",
  "type" : "Basic Kubernetes Cluster",
  "appManager" : "Default",
  "path" : "Ironfan/kubernetes/basic/spec.json"
  "vendor" : "Mesos",
  "version" : "^(\\d)+(\\.\\w+)*",
  "type" : "Basic Mesos Cluster",
  "appManager" : "Default",
  "path" : "Ironfan/mesos/basic/spec.json"


  "name" : "kubernetes",
  "vendor" : "KUBERNETES",
  "version" : "0.5.4",
  "packages" : [
      "tarball": "kubernetes/kubernetes-0.5.4.tar.gz",
      "roles": [
  "name" : "mesos",
  "vendor" : "MESOS",
  "version" : "0.21.0",
  "packages" : [
      "package_repos": [ ""],
      "roles" : [

The repos built into the Fling are not present (unfortunately) on the management server. This was the only tedious portion of the entire process. The easiest method is to grab the files out of an existing BDE Fling management server and copy them into the new one. The other option is find the latest RPMs on the Internet and add them to the management server manually. In either case, you’ll need to run the CentOS syntax for creating the repository.

Create local repo for Apache Mesos

# su - serengeti
$ cd /opt/serengeti/www/yum
$ vim mesos.repo
name=Apache Mesos

$ mkdir -p repos/mesos/current/RPMS
$ cd repos/mesos/current

The Fling included the following files:
- bigtop-utils-
- chronos-2.3.0-0.1.20141121000021.x86_64.rpm
- docker-io-1.3.1-2.el6.x86_64.rpm
- marathon-0.7.5-1.0.x86_64.rpm
- mesos-0.21.0-1.0.centos65.x86_64.rpm
- subversion-1.6.11-10.el6_5.x86_64.rpm
- zookeeper-
- zookeeper-server-

$ createrepo .

A restart of Tomcat is all that is needed and then you will be able to start deploying Apache Mesos and Kubernetes clusters through BDE v2.2.

If you want to take advantage of the Instant Clone functionality, you will need to be running vSphere 6.0 and BDE v2.2. There are also a couple adjustments to the /opt/serengeti/conf/serengeti.properties files that will be need to be made. I will be going over those in a future post discussing how to use the Photon OS as the template for BDE to deploy.

Big Data Extensions v2.2 Released

VMware released the latest version of Big Data Extensions during Hadoop Summit on June 4, 2015. Included in the release notes, there are two features that have me really excited for this version.

Resize Hadoop Clusters on Demand. You can reduce the number of virtual machines in a running Hadoop cluster, allowing you to manage resources in your VMware ESXi and vCenter Server environments. The virtual machines are deleted, releasing all resources such as memory, CPU, and I/O.

Increase Cloning Performance and Resource Usage of Virtual Machines. You can rapidly clone and deploy virtual machines using Instant Clone, a feature of vSphere 6.0. Using Instant Clone, a parent virtual machine is forked, and then a child virtual machine (or instant clone) is created. The child virtual machine leverages the storage and memory of the parent, reducing resource usage.

These are really outstanding features, especially the ability to use the Instant Clone (aka VMFork) functionality introduced in vSphere 6.0. The Instant Clone technology has very interesting implications when considered with the work in the Cloud Native Application space. Deploying a very small Photon VM to immediately launch a Docker container workload in a matter of seconds will add a huge benefit to running virtualized Apache Mesos clusters (with Marathon) that have been deployed using the BDE framework.

I had hoped to see the functionality from the BDE Fling that included recipes for Mesos and Kubernetes to be incorporated into the official VMware release. On a positive note, the cookbooks for Mesos, Marathon and Kubernetes are present on the management server. It will just take a little effort to unlock those features.

06.27.2015 UPDATE: I have confirmed all of the cookbooks for Mesos, Docker and Kubernetes are present on the BDE v2.2 management server. I will have a post shortly describing how to unlock them for use.

Overall, a big release from the VMware team and it appears they are on the right track to increase the functionality of the BDE framework!