HAProxy support for Mesos in vSphere Big Data Extensions

I realized late last night the current vSphere Big Data Extensions fling does not have HAProxy built into it for the Mesos cluster deployments. After a bit of reading and testing new pieces inside the Chef recipes, I have added support so that HAProxy is running on all of the Mesos nodes. The first thing is to add the HAProxy package to the /opt/serengeti/chef/cookbooks/mesos/recipes/install.rb file:

 72   %w( unzip libcurl haproxy ).each do |pkg|
 73     yum_package pkg do
 74       action :install
 75     end
 76   end

There is also a script that Mesosphere provides to modify the HAProxy configuration file and reload the rules when changes occur. You can find instructions on the file and how to incorporate it on the Mesosphere page.

Note: I had to edit ‘sudo’ out of the lines inside the script in order for Chef to execute it properly.

After copying the file haproxy-marathon-bridge into my Chef server, I added the following code to the same install.rb file to get things all setup and configured properly:

 82   directory "/etc/haproxy-marathon-bridge" do
 83     owner 'root'
 84     group 'root'
 85     mode '0755'
 86     action :create
 87   end
 89   template '/usr/local/bin/haproxy-marathon-bridge' do
 90     source 'haproxy-marathon-bridge.erb'
 91     action :create
 92   end
 94   master_ips = mesos_masters_ip
 95   slave_ips = mesos_slaves_ip
 97   all_ips = master_ips
 98   all_ips += slave_ips
100   template '/etc/haproxy-marathon-bridge/marathons' do
101     source 'marathons.erb'
102     variables(
103       haproxy_server_list: all_ips
104     )
105     action :create
106   end
108   execute 'configure haproxy' do
109     command 'chkconfig haproxy on; service haproxy start'
110   end
112   execute 'setup haproxy-marathon-bridge' do
113     command 'chmod 755 /usr/local/bin/haproxy-marathon-bridge; /usr/local/bin/haproxy-marathon-bridge install_cronjob'
114   end

There is also a bit of supporting code needed for lines 94-98 above that were added to /opt/serengeti/chef/cookbooks/mesos/libraries/default.rb:

  1 module Mesosphere
  3   def mesos_masters_ip
  4     servers = all_providers_fqdn_for_role("mesos_master")
  5     Chef::Log.info("Mesos master nodes in cluster #{node[:cluster_name]} are: #{servers.inspect}")
  6     servers
  7   end
  9   def mesos_slaves_ip
 10     servers = all_providers_fqdn_for_role("mesos_slave")
 11     Chef::Log.info("Mesos slave nodes in cluster #{node[:cluster_name]} are: #{servers.inspect}")
 12     servers
 13   end
 15 end
 17 class Chef::Recipe; include Mesosphere; end

The last thing needed is a new template file for the /etc/haproxy-marathon-bridge/marathons file that is needed by the script provided by Mesosphere. I created the file /opt/serengeti/chef/cookbooks/mesos/templates/default/marathons.erb:

  1 # Configuration file for haproxy-marathon-bridge script
  2 <%
  3   ha_url_list = []
  4   @haproxy_server_list.each do |ha_server|
  5     ha_url_list << "#{ha_server}"
  6   end
  7 %>
  8 <%= ha_url_list.join(":8080\n") + ":8080" %>

At this point, all of the modifications can be uploaded to the Chef server with the command knife cookbook upload -a and a new cluster can be deployed with HAProxy support.

After deploying a nginx workload, you scale it out and check the /etc/haproxy/haproxy.cfg file on a master node and see entries like:

[root@hadoopvm388 haproxy]# cat haproxy.cfg global
  log local0
  log local1 notice
  maxconn 4096
  log            global
  retries             3
  maxconn          2000
  timeout connect  5000
  timeout client  50000
  timeout server  50000
listen stats
  mode http
  stats enable
  stats auth admin:admin
listen nginx-80
  mode tcp
  option tcplog
  balance leastconn
  server nginx-10 hadoopvm382.localdomain:31000 check
  server nginx-9 hadoopvm390.localdomain:31000 check
  server nginx-8 hadoopvm387.localdomain:31000 check
  server nginx-7 hadoopvm389.localdomain:31000 check
  server nginx-6 hadoopvm386.localdomain:31000 check
  server nginx-5 hadoopvm383.localdomain:31000 check
  server nginx-4 hadoopvm378.localdomain:31001 check
  server nginx-3 hadoopvm381.localdomain:31000 check
  server nginx-2 hadoopvm385.localdomain:31000 check
  server nginx-1 hadoopvm378.localdomain:31000 check


Apache Storm cluster deployment through vSphere Big Data Extensions

Last year at VMworld, Andy and I spoke about the data pipeline and all of the different pieces involved and how their interactions lead to congestion. For any organization, how you deal with the data congestion will affect how much data an application can process. Fortunately, much like what Apache Hadoop has done for batch processing, Apache Storm has entered the real-time processing arena to help applications more efficiently process big data.

In case you are unfamiliar with Apache Storm, a basic explanation of its purpose and design can be seen on the Apache Storm site.

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, is used by many companies, and is a lot of fun to use!

Hortonworks has provided some great insight into how Apache Storm can be utilized alongside Hadoop to allow organizations to become even more agile and efficient in their data pipeline processing.  Continue reading “Apache Storm cluster deployment through vSphere Big Data Extensions”

VMware releases new Big Data Extensions fling!

vmware-sliderHot on the heels of my recent posts, and that from Andrew Nelson, the VMware Big Data Extensions team has released an official fling that extends the functionality to include Mesos, Marathon, Chronos, Docker and Kubernetes!

From the site:

“Big Data Extensions can be easily extended to deploy and manage all kinds of distributed or non-distributed applications. This release of the BDE-SE Fling adds support for deploying Mesos (with Chronos and Marathon) as well as Kubernetes clusters in addition to the Hadoop and HBase clusters.

Big Data Extensions simplifies the cluster deployment and provisioning process, gives you a real time view of the running services and the status of their virtual hosts, provides a central place from which to manage and monitor your clusters, and incorporates a broad range of tools to help you optimize cluster performance and utilization.

Big Data Extensions provides the following features:

  • Fast deployment, management, and scaling of Hadoop, Mesos and Kubernetes clusters. Big Data Extensions enable rapid deployment of Hadoop, Mesos and Kubernetes clusters on VMware vSphere. You can also quickly manage, scale out clusters, and scale up/down nodes subsequently.

  • Support for Docker. The Big Data Extensions for vSphere Standard Edition Fling includes support for Docker with Mesos, Marathon, Chronos, and Kubernetes.

  • Graphical User Interface Simplifies Management Tasks. The Big Data Extensions plug-in for vSphere, a graphical user interface integrated with vSphere Web Client, lets you easily perform common infrastructure and cluster management administrative tasks.

  • All-in-one solution. Big Data Extensions ships with installation package and configuration scripts for Apache Bigtop 0.8.0, Kubernetes 0.5.4, Mesos 0.21.0, Chronos 2.3.0 and Marathon 0.7.5. You can create and manage Hadoop, Mesos, and Kubernetes clusters out of box. You can also add new versions of these softwares into Big Data Extensions Server and create the cluster.”

Head over to the Flings page at VMware and download the latest to see how it all works! Great job by the BDE engineering team!

VMware BDE + Zookeeper: The unknown cluster option

If you have taken a look underneath the covers of VMware Big Data Extensions, then you have probably seen the Zookeeper Chef cookbooks that are part of every default cluster deployment. The Zookeeper role is built-in and was an critical part of being able to develop the option for a Mesosphere cluster so quickly using BDE — no need to reinvent the wheel. The only part missing from being able to deploy a Zookeeper only cluster is the JSON specification file.

I took a few minutes and put together a quick JSON specification file that can be used to deploy just Zookeeper as a cluster that could then be utilized by any application as a service layer. As with the Mesosphere cluster, I started with the basic cluster JSON file found in the /opt/serengeti/samples directory.

  1 // This is a cluster spec for creating a Zookeeper cluster without installing any hadoop stuff.
  2 {
  3   "nodeGroups":[
  4     {
  5       "name": "master",
  6       "roles": [
  7         "zookeeper"
  8       ],
  9       "instanceNum": 5,
 10       "cpuNum": 2,
 11       "memCapacityMB": 3768,
 12       "storage": {
 13         "type": "SHARED",
 14         "sizeGB": 50
 15       },
 16       "haFlag": "on"
 17     } 
 18   ]   
 19 }

A quick definition in the /opt/serengeti/conf/serengeti.properties file, the /opt/serengeti/www/specs/map file and /opt/serengeti/www/manifest file is all that is needed. Quickly restart tomcat on the management server and you are off to the races!

The unknown cluster option is now available with very little modification to your BDE environment.


BDE + Mesosphere cluster code on GitHub

I have uploaded the necessary files to begin including the option for deploying a Mesosphere cluster with VMware Big Data Extensions v2.1. You can download the tarball or clone the repo via the following link:


As I begin work and provide further extensions for other clustering technologies, I will make them available via GitHub as well. To include this in your deployment, extract it directly into the /opt/serengeti folder — although be aware it will replace the default map and default manifest files as well. After the files are extracted (as user serengeti), simply run two commands on the BDE management server:

# knife cookbook upload -a
# service tomcat restart

If you have any questions, feel free to reach out to me over Twitter.