HDFS-Only Cluster through Big Data Extensions

After working with EMC over the summer and evaluating the capabilities of utilizing Isilon storage as an HDFS layer, including the NameNode, it got me thinking about how the VMware Big Data Extensions could be utilized to create the exact same functionality. If you’ve read any of the other posts around extending the capabilities of BDE beyond just what it ships with, you’ll know the framework allows an administrator to do nearly anything they can imagine.

As with creating a Zookeeper-only cluster, all of the functionality for a HDFS-only cluster is already built into BDE — it is just a matter of unlocking it. It took less than 10 minutes to set up all the pieces.

I needed to add Cloudera 5.2.1 support into my new BDE 2.1 lab environment, so the first command sets that functionality up. After that, the rest of commands are all that are needed:

# config-distro.rb --name cdh5 --vendor CDH --version 5.2.1 --repos http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
# cd /opt/serengeti/www/specs/Ironfan/hadoop2/
# mkdir -p donly
# cp conly/spec.json donly/spec.json
# vim donly/spec.json

I configured the donly/spec.json file to include the following:

  1 {
  2   "nodeGroups":[
  3     {
  4       "name": "DataMaster",
  5       "description": "It is the VM running the Hadoop NameNode service. It manages HDFS data and assigns tasks to workers. The number of VM can only be one. User can specify     size of VM.",
  6       "roles": [
  7         "hadoop_namenode"
  8       ],
  9       "groupType": "master",
 10       "instanceNum": "[1,1,1]",
 11       "instanceType": "[MEDIUM,SMALL,LARGE,EXTRA_LARGE]",
 12       "cpuNum": "[2,1,64]",
 13       "memCapacityMB": "[7500,3748,max]",
 14       "storage": {
 15         "type": "[SHARED,LOCAL]",
 16         "sizeGB": "[50,10,max]"
 17       },
 18       "haFlag": "on"
 19     },
 20     {
 21       "name": "DataWorker",
 22       "description": "They are VMs running the Hadoop DataNode services. They store HDFS data. User can specify number and size of VMs in this group.",
 23       "roles": [
 24         "hadoop_datanode"
 25       ],
 26       "instanceType": "[SMALL,MEDIUM,LARGE,EXTRA_LARGE]",
 27       "groupType": "worker",
 28       "instanceNum": "[3,1,max]",
 29       "cpuNum": "[1,1,64]",
 30       "memCapacityMB": "[3748,3748,max]",
 31       "storage": {
 32         "type": "[LOCAL,SHARED]",
 33         "sizeGB": "[100,20,max]"
 34       },
 35       "haFlag": "off"
 36     }
 37   ]
 38 }

The final part is to add an entry for a HDFS-Only cluster in the /opt/serengeti/www/specs/map file:

128   {
129     "vendor" : "CDH",
130     "version" : "^\\w+(\\.\\w+)*",
131     "type" : "HDFS Only Cluster",
132     "appManager" : "Default",
133     "path" : "Ironfan/hadoop2/donly/spec.json"
134   },

Restart the Tomcat service on the management server and the option is now available. The configured cluster when it is done looked like this in the VMware vCenter Web Client:


You can view the status of the HDFS layer through the standard interface:


Being able to have this functionality within VMware Big Data Extensions allows an environment to provide a dedicated HDFS data warehouse layer to your applications and other application cells.

VMware releases new Big Data Extensions fling!

vmware-sliderHot on the heels of my recent posts, and that from Andrew Nelson, the VMware Big Data Extensions team has released an official fling that extends the functionality to include Mesos, Marathon, Chronos, Docker and Kubernetes!

From the site:

“Big Data Extensions can be easily extended to deploy and manage all kinds of distributed or non-distributed applications. This release of the BDE-SE Fling adds support for deploying Mesos (with Chronos and Marathon) as well as Kubernetes clusters in addition to the Hadoop and HBase clusters.

Big Data Extensions simplifies the cluster deployment and provisioning process, gives you a real time view of the running services and the status of their virtual hosts, provides a central place from which to manage and monitor your clusters, and incorporates a broad range of tools to help you optimize cluster performance and utilization.

Big Data Extensions provides the following features:

  • Fast deployment, management, and scaling of Hadoop, Mesos and Kubernetes clusters. Big Data Extensions enable rapid deployment of Hadoop, Mesos and Kubernetes clusters on VMware vSphere. You can also quickly manage, scale out clusters, and scale up/down nodes subsequently.

  • Support for Docker. The Big Data Extensions for vSphere Standard Edition Fling includes support for Docker with Mesos, Marathon, Chronos, and Kubernetes.

  • Graphical User Interface Simplifies Management Tasks. The Big Data Extensions plug-in for vSphere, a graphical user interface integrated with vSphere Web Client, lets you easily perform common infrastructure and cluster management administrative tasks.

  • All-in-one solution. Big Data Extensions ships with installation package and configuration scripts for Apache Bigtop 0.8.0, Kubernetes 0.5.4, Mesos 0.21.0, Chronos 2.3.0 and Marathon 0.7.5. You can create and manage Hadoop, Mesos, and Kubernetes clusters out of box. You can also add new versions of these softwares into Big Data Extensions Server and create the cluster.”

Head over to the Flings page at VMware and download the latest to see how it all works! Great job by the BDE engineering team!

VMware BDE + Zookeeper: The unknown cluster option

If you have taken a look underneath the covers of VMware Big Data Extensions, then you have probably seen the Zookeeper Chef cookbooks that are part of every default cluster deployment. The Zookeeper role is built-in and was an critical part of being able to develop the option for a Mesosphere cluster so quickly using BDE — no need to reinvent the wheel. The only part missing from being able to deploy a Zookeeper only cluster is the JSON specification file.

I took a few minutes and put together a quick JSON specification file that can be used to deploy just Zookeeper as a cluster that could then be utilized by any application as a service layer. As with the Mesosphere cluster, I started with the basic cluster JSON file found in the /opt/serengeti/samples directory.

  1 // This is a cluster spec for creating a Zookeeper cluster without installing any hadoop stuff.
  2 {
  3   "nodeGroups":[
  4     {
  5       "name": "master",
  6       "roles": [
  7         "zookeeper"
  8       ],
  9       "instanceNum": 5,
 10       "cpuNum": 2,
 11       "memCapacityMB": 3768,
 12       "storage": {
 13         "type": "SHARED",
 14         "sizeGB": 50
 15       },
 16       "haFlag": "on"
 17     } 
 18   ]   
 19 }

A quick definition in the /opt/serengeti/conf/serengeti.properties file, the /opt/serengeti/www/specs/map file and /opt/serengeti/www/manifest file is all that is needed. Quickly restart tomcat on the management server and you are off to the races!

The unknown cluster option is now available with very little modification to your BDE environment.


BDE + Mesosphere cluster code on GitHub

I have uploaded the necessary files to begin including the option for deploying a Mesosphere cluster with VMware Big Data Extensions v2.1. You can download the tarball or clone the repo via the following link:


As I begin work and provide further extensions for other clustering technologies, I will make them available via GitHub as well. To include this in your deployment, extract it directly into the /opt/serengeti folder — although be aware it will replace the default map and default manifest files as well. After the files are extracted (as user serengeti), simply run two commands on the BDE management server:

# knife cookbook upload -a
# service tomcat restart

If you have any questions, feel free to reach out to me over Twitter.

Apache Mesos Clusters – Part 3

The post includes the final pieces necessary to get a Mesosphere stack deployed through Big Data Extensions within a VMware environment. I’ve included the Chef cookbooks and commands required for tying all of the pieces together for a cluster deployment. The wonderful thing about the framework is the extensibility — once I had Mesos deploying, it became very clear how simple it is to extend the framework even further — look for future posts.

The idea that you can now turn a large cluster of VMs into a single Mesos cluster for use by a product, engineering team or operations team opens up an entirely new world within our environments. This is a very exciting place to be investing time.

Chef Roles

Big Data Extensions uses role definitions within the framework, so the first step was to create a new role for Mesos. If you remember from Part 2, we defined the role in the JSON file and called it ‘mesos’.

The role files can be found in /opt/serengeti/chef/roles. I created the roles for both mesos_master and mesos_worker through the command line interface:

Continue reading “Apache Mesos Clusters – Part 3”