OpenStack Resource Plugin for VMware Big Data Extensions

The final challenge in offering a full-featured Platform 3 private cloud utilizing OpenStack and VMware technologies has been the Platform-as-a-Service layer. VMware has many tools for helping individuals and companies offer a Platform-as-a-Service layer — vRealize Automation, vRealize Orchestrator and VMware Big Data Extensions. However, with the release of VMware Integrated OpenStack — and OpenStack in general — there is a disconnect between the Infrastructure-as-a-Service and Platform-as-a-Service layers. OpenStack has a few projects — like Sahara — to bridge the gap, but they are immature when compared to VMware Big Data Extensions. The challenge for me became figuring out a method for integrating the two so that an OpenStack offering built on top of VMware vSphere technologies could offer up a robust Platform-as-a-Service offering.

It has taken a fair bit of time, effort and testing, but I am pleased to announce the alpha release of the Virtual Elephant Big Data Extensions resource plugin for OpenStack Heat. The resource plugin enables an OpenStack deployment to utilize and deploy any cluster application the VMware Big Data Extensions management server is configured to deploy. The resource plugin accomplishes this by making REST API calls to the VMware Big Data Extensions management server to deploy the pre-defined JSON cluster specification files. The addition of the resource plugin to an OpenStack environment greatly expands the capabilities of the environment, without requiring a infrastructure engineer or architect to start from scratch.

The resource plugin itself requires several modifications to the VMware Big Data Extensions management server to be made. One challenge I encountered initially was lack of functionality built into the REST API. I received assistance from one of the VMware Big Data Extensions engineers — Jesse — who modified several of the JAVA jar files to add the features necessary. Writing the resource plugin would have been much more difficult were it not for several people at VMware — including Jesse, Andy and several of the VIO engineering team — who assisted me in my efforts. A big THANK YOU to each of them!

Disclaimer: As stated, the resource plugin is considered in an alpha state. It can be rather temperamental, but I wanted to get the code out there and (hopefully) get others excited for the possibilities.

Environment Notes

Obviously, there may be some differences between my OpenStack environment and your own. The work I have done has obviously been focused around VMware technologies, including Big Data Extensions. The other technologies the resource plugin relies upon are VMware Integrated OpenStack and VMware NSX. That is not to say the resource plugin will not work if you are using other OpenStack technologies, I mention it so that there is no misunderstanding the environment for which the plugin has been written.

Fortunately, the entire environment I designed the resource plugin for can be referenced on the OpenStack website, as it is the official reference architecture for VMware Integrated OpenStack the foundation has adopted.

Final note before I begin discussing the installation and configuration required for the resource plugin — this level of modification of VMware Big Data Extensions will most likely put it into an unsupported state if you have issues and try to contact VMware support.

Installation Guide

In order to begin using the resource plugin, the VMware Big Data Extensions management server will need to be modified. Depending on how many of the additional cluster deployments you have integrated from the Virtual Elephant site, additional steps may be required to enable deployments of every cluster type. The resource plugin, REST API test scripts and the updated JAVA files can be downloaded from the Virtual Elephant GitHub site. Once you have checked-out the repository, perform the following steps within your environment.

Note: I am using VMware Integrated OpenStack and the paths reflect that environment. You may need to adjust the commands for your implementation.

Copy the resource plugin ( to the OpenStack controller(s):

$ scp plugin/ user@controller1.localdomain:/usr/lib/heat
$ ssh user@controller1.localdomain "service heat-engine restart"
$ ssh user@controller1.localdomain "grep VirtualElephant /var/log/heat/heat-engine.log"
$ scp plugin/ user@controller2.localdomain:/usr/lib/heat
$ ssh user@controller2.localdomain "service heat-engine restart"
$ ssh user@controller2.localdomain "grep VirtualElephant /var/log/heat/heat-engine.log"

Copy the VIO config file (and modify) to the OpenStack controller(s):
$ scp plugin/vio.config user@controller1.localdomain:/usr/local/etc/
$ scp plugin/vio.config user@controller2.localdomain:/usr/local/etc/

Copy the update JAVA files to the Big Data Extensions management server:

$ scp java/cluster-mgmt-2.1.1.jar user@bde.localdomain:/opt/serengeti/tomcat6/webapps/serengeti/WEB-INF/lib/
$ scp java/commons-serengeti-2.1.1.jar user@bde.localdomain:/opt/serengeti/tomcat6/webapps/serengeti/WEB-INF/lib/
$ scp java/commons-serengeti-2.1.1.jar user@bde.localdomain:/opt/serengeti/cli/conf/
$ ssh user@bde.localdomain "service tomcat restart"

If using VMware Integrated OpenStack, the curl package is required:

$ ssh user@controller1.localdomain "apt-get -y install curl"
$ ssh user@controller2.localdomain "apt-get -y install curl"

If you do not have a resource pool definition on the Big Data Extensions management server for the OpenStack compute cluster, you will need to create it now.

$ ssh root@bde.localdomain
# java -jar /opt/serengeti/cli/serengeti-cli-2.1.1.jar
serengeti> connect --host bde.localdomain:8443
serengeti> resourcepool list
serengeti> resourcepool add --name openstackRP --vccluster VIO-CLUSTER-1

Note: If you use the resource pool name ‘openstackRP’, no further modifications to the JSON file are required. That value is the default for the resource plugin variable CLUSTER_RP, but it can be overridden in the JSON file.

At this point, the OpenStack controller(s) where Heat is running now have the resource plugin installed and you should have seen an entry stating it was registered when you restarted the heat-engine service. In addition, the management server for Big Data Extensions has the required updates that will allow the REST API to support the resource plugin. The next steps before the plugin can be consumed will be to copy/create JSON files for the cluster-types you intend to support within the environment. Within the GitHub repository, you will have an example JSON file that can be used. One of the updates to the management server included logic to look in the /opt/serengeti/conf file for these JSON files.

Copy example mesos-default-template-spec.json file:
$ scp json/mesos-default-template-spec.json user@bde.localdomain:/opt/serengeti/conf/

Heat Template Configuration

When creating the JSON file (or YAML) for OpenStack Heat to consume, there are several key parameters that will be required. As this is the initial release of the resource plugin, there are additional changes planned for the future, including a text configuration file you will place on the controllers to hide several of these parameters.

Sample JSON entry with required parameters:

 68         "Mesosphere-Tenant-0" : {
 69             "Type" : "VirtualElephant::VMware::BDE",
 70             "Properties" : {
 71                 "bde_endpoint" : "bde.localdomain",
 72                 "vcm_server" : "vcenter.localdomain",
 73                 "username" : "administrator@vsphere.local",
 74                 "password" : "password",
 75                 "cluster_name" : "mesosphere_tenant_01",
 76                 "cluster_type" : "mesos", 
 77                 "cluster_net" : { "Ref" : "mesos_network_01" }
 78             }
 79         }

You can see from the example above why parameters like ‘bde_endpoint’, ‘vcm_server’, ‘username’ and ‘password’ should be hidden by the consumers of the OpenStack Heat orchestration.

Once you have a JSON file defined, it can be deployed using OpenStack Heat — either through the user-interface or API. The deployment will then proceed and you can view the topology of the stack within your environment. If you use the JSON provided in GitHub (mesosphere_stack.json), it will look like the graphic below.

openstack mesosphere

Congratulations — you have now extended your OpenStack environment to be able to support robust cluster deployments using the VMware Big Data Extensions framework!

Future Enhancements

The resource plugin is not yet fully-baked and there are several features I would still like to implement in the future. Currently, the resource plugin has the necessary code to deploy and delete clusters when initiated through OpenStack Heat. Features I will be working on extending in the future include:

  1. Report back to OpenStack cluster deployment status – currently a fire-and-forget mentality.
  2. Assign floating IP to the cluster.
  3. Ability to scale-out clusters deployed with OpenStack Heat.
  4. Enhance Big Data Extensions REST API to utilize JSON specification files in /opt/serengeti/www/specs/ versus the segregated JSON files it is using today.
  5. Support for prescribed VM template sizes (i.e. m1.small, m1.medium, m1.large, etc).
  6. Enhanced error detection for REST API calls made within the resource plugin.
  7. Cluster password support.
  8. Check and abide by Big Data Extensions naming-schema for cluster names.
  9. Incorporate OpenStack key-pairs with cluster nodes.

Closing Notes

There is always more work to be done on a project such as this, but I am excited to have the offering available at this time — even in its limited alpha state. Being able to bridge the gap between the Infrastructure-as-a-Service and Platform-as-a-Service layers is a key requirement for private cloud providers. The challenges I have faced (along with my coworkers) supporting our current environment and designing/implementing our next-generation private cloud have brought this reality to the forefront. In order to provide an AWS-like service offering, bridging the gap between the layers was an absolute necessity and I am extremely grateful for the support I have received from my peers in helping to solve this problem.

Look for an upcoming post going through the resource plugin code, highlighting the integration that was necessary between Big Data Extensions and NSX-v. In the meantime, reach out to me on Twitter (@chrismutchler) if you have questions or comments on the resource plugin, the OpenStack implementation or VMware Big Data Extensions.

Enabling VMware Big Data Extensions REST API

The more I have interfaced with the VMware Big Data Extensions application, the more important accessibility to the REST API became. Documentation for the REST API is normally not exposed, but you can enable it with a single command on the management server.

# /opt/serengeti/sbin/cfgrestschema on

You can then access the documentation for the REST API at the following URL: http://bde.localdomain:8080/restschema/service?APPLICATION=BigDataExtensions

Important Note: There is a rather large disclaimer on the main page that the REST API is not supported by VMware.

Sneak-Peek: OpenStack Deployments of Apache Mesos Cluster with VMware Big Data Extensions

The work is not yet complete, but I have made a significant amount of progress integrating VMware Big Data Extensions with OpenStack by allowing deployments to occur through the Heat API. The primary objective is to allow a developer (end-user) to deploy any cluster BDE supports through a Heat template within a micro-segmented network. The resource plugin for Heat then hides the fact that the deployment itself is being handled by VMware Big Data Extensions.

It then allows a small piece of JSON code to be inserted into the OpenStack Heat template that looks like this:

 50         "Mesosphere-Cell-0" : {
 51             "Type" : "VirtualElephant::VMware::BDE",
 52             "Properties" : {
 53                 "bde_endpoint" : "bde.localdomain",
 54                 "username" : "administrator@vsphere.local",
 55                 "password" : "password",
 56                 "cluster_name" : "mesos_heat_api_11",
 57                 "cluster_type" : "mesos",
 58                 "cluster_net" : "mgmtNetwork"
 59             }
 60         }

A topology view of the stack then looks like this:


As I said, it is not fully complete right now — you’ll notice the Mesos cell is off by itself on the right-side of the screen capture. The code to attach it to the micro segmented network created in the JSON has not been written yet. But after struggling with Python the last few days (PERL is my preferred language) and working through issues with Heat itself, I made significant progress and wanted to share it with everyone.

As soon as it is ready, I’ll be posting all the code in my GitHub repo and sharing all the pieces that went into writing the resource plugin for Heat.

Upgrading Apache Mesos & Marathon on VMware Big Data Extensions


There are a couple new versions for both Apache Mesos and Mesosphere Marathon. Since one of the great things about using VMware Big Data Extensions to handle the deployment and management of clusters is the ability to ensure control of the packages and versions within the SDLC. There were a few changes to be made within Chef to support Mesos v0.22.0 and Marathon 0.8.0* within the Big Data Extensions management server.

First thing, you need to grab the new RPM files and place them into your management server under /opt/serengeti/www/yum/repos/mesos/0.21.0 directory. For my own installation, I created a new directory called mesos/0.22.0 and copied the packages that had not been updated into it — but that decision is up to you. Once the files are there, you can choose whether to increment the version number in /opt/serengeti/www/distros/manifest file:

44   {
45     "name": "mesos",
46     "vendor": "MESOS",
47     "version": "0.22.0",
48     "packages": [
49       {
50         "package_repos": [
51           "https://bde.localdomain/yum/mesos.repo"
52         ],
53         "roles": [
54           "zookeeper",
55           "mesos_master",
56           "mesos_slave",
57           "mesos_docker",
58           "mesos_chronos",
59           "mesos_marathon"
60         ]
61       }
62     ]
63   },

If you choose to update the manifest file, make sure you restart tomcat on the BDE management server. You normally would be done at this point, but there were changes in where Mesos and Marathon install two files that we’ll need to account for. The old files were in subdirectories under /usr/local and are now placed in subdirectories of /usr.


  9 exec /usr/bin/marathon


 78 template '/usr/etc/mesos/' do
 79   source ''
 80   variables(
 81     zookeeper_server_list: zk_server_list,
 82     zookeeper_port: zk_port,
 83     zookeeper_path: zk_path,
 84     logs_dir: node['mesos']['common']['logs'],
 85     work_dir: node['mesos']['slave']['work_dir'],
 86     isolation: node['mesos']['isolation'],
 87   )
 88   notifies :run, 'bash[restart-mesos-slave]', :delayed
 89 end

Update the files in the Chef server by running ‘knife cookbook upload -a‘ and you are all set.

Go forth and deploy your updated Apache Mesos 0.22.0 and Marathon 0.8.0 clusters!

*Marathon 0.8.1 has been released but there has not been a RPM file created by Mesosphere yet.

VMware Big Data Extensions Python REST API Scripts

As part of the work I am doing to get VMware Big Data Extensions to integrate with OpenStack Heat, I had to write several Python scripts to test the REST API that BDE offers. I currently have scripts to create clusters and also start|stop|delete clusters within a vSphere environment.

They are functional and were a good introduction to Python — I’ve been a PERL coder since the mid-1990s — though I am sure there are improvements that could still be made. I have put the scripts up in the Virtual Elephant GitHub repository.

I will make disclaimer that the script will not work out of the box for non-Hadoop cluster types, unless you modify several of the jar files BDE uses. I have been able to work with another engineer who assisted me in adding some of functionality that was missing in order to create Mesos clusters.

There will be an upcoming post that will explain how to expand the current REST API to allow for all cluster types that are setup within BDE can be deployed through the API. Adding this functionality has been an important step my integration efforts.

The testing scripts can be downloaded here.

The code can be reviewed after the break.

Continue reading “VMware Big Data Extensions Python REST API Scripts”