Building on Project Photon & Project Lightwave

The opportunities for VMware with Project Photon and Project Lightwave are significant. The press release stated:

Designed to help enterprise developers securely build, deploy and manage cloud-native applications, these new open source projects will integrate into VMware’s unified platform for the hybrid cloud — creating a consistent environment across the private and public cloud to support cloud-native and traditional applications. By open sourcing these projects, VMware will work with a broad ecosystem of partners and the developer community to drive common standards, security and interoperability within the cloud-native application market — leading to improved technology and greater customer choice.

What I always find interesting is the lack of discussion around the orchestration and automation of the supporting applications. The orchestration layer does not miraculously appear within a private cloud environment for the developers to consume. The pieces have to be in place in order for developers to consume the services a Mesos cluster offers them. For me, the choice is pretty obvious — expand what the Big Data Extensions framework is capable of providing. I alluded to this thought on Monday when the announcement was made.

Building on that thought and after seeing a diagram of VMware’s vision for how all the pieces tie together, I worked on a logical diagram of how the entire architecture could look like. I believe it looks something like this:



In this environment, Project Photon and Project Lightwave are able to be leveraged beyond just ESXi. By enhancing the deployment options for BDE to include ESXi on vCloud Air (not shown above), KVM and physical (through Ironic), the story is slightly changed. The story now sounds something like this:

For a developer, you choose what Cloud Native application orchestration layer (Mesos, Marathon, Chronos, CloudFoundry, etc.) you would like and communicate with it over the provided API. For operations, the deployment of the tenants within the private cloud environment can be deployed using the OpenStack API (with Heat templates). For both sides, SDLC consistency is maintained through the development process to production.

Simplicity is achieved by only interacting with two APIs — one for operations and one for development. There is large amount of work to do here. First, I need to continue to improve the OpenStack resource plugin to be production-ready. Second, testing of Project Photon inside BDE needs to take place — I imagine there will be some work to have it integrated correctly with the Chef server. Third, the deployment mechanism inside BDE needs to be enhanced to support other options. If the first two were a heavy lift, the last one is going to take a small army — but it is a challenge I am ready to take on!

Ultimately, I feel the gaps in OpenStack around Platform-as-a-Service orchestration can be solved though integrating Big Data Extensions. The framework is more robust and mature when compared to the Sahara offering. The potential is there, it just needs to be executed on.

Thoughts on the VMware Project Photon Announcement

project photon

VMware announced a new open source project call Project Photon today. The full announcement call be seen here. Essentially Project Photon is a lightweight Linux operating system built to support Docket and rkt (formerly Rocket) containers. The footprint is less than 400MB and can run containers immediately upon instantiation. I had heard rumors the announcement today was going to include some sort of OS, but I was not very excited about it until I started reading the material being released prior to the launch event in a few hours.

Having seen the demo and read the material, my mind went into overdrive for the possibilities both open source projects offer organizations who are venturing down the Cloud Native Apps (or Platform 3) road. I believe VMware has a huge opportunity here to cement themselves as the foundation for running robust, secure and enterprise-ready Cloud Native Applications. If you think about the performance gains vSphere 6.0 has provided, and then look at how they are playing in the OpenStack space with both VIO and NSX, the choice becomes obvious.

The area of focus now needs to be on tying all of the pieces together to offer organizations an enterprise-class end-to-end Platform-as-a-Service solution. This is where, I believe, the VMware Big Data Extensions framework should play an enormous part. The framework already allows deployment of Hadoop, Mesos and Kubernetes clusters. Partner the framework with Project Photon and you now have a minimal installation VM that can be launched within seconds with VMFork. From there, the resource plugin Virtual Elephant launched today could be mainstreamed (and improved) to allow for the entire deployment of a Mesos stack, backed by Project Photon, through the open source API OpenStack offers with Heat.

Epic win!

There is still work VMware could do with the Big Data Extensions framework to improve its capabilities, especially with newcomers SequenceIQ and their Cloudbreak offering stiff competition. Expanding BDE to be able to deploy clusters beyond an internal vSphere environment, but also to the major public cloud environments — including their own vCloud Air — will be key going forward. The code for BDE is already an open source project — by launching these two new open source projects they are showing the open source community they are serious.

This is a really exciting time in virtualization and I just got even more excited today!

SequenceIQ Cloudbreak Hadoop Cluster Deployments

I saw activity on Twitter talking about Cloudbreak from SequenceIQ as a method for deploying Hadoop clusters into several public cloud providers. My interest was peaked and I started reading through the Cloudbreak documentation on Saturday night. About 20 minutes later, I had an account setup, tied it to my AWS account and started deploying a cluster. Approximately 15 minutes later, the cluster was deployed and Ambari was setup running and I could log into the UI.

The interface is truly brilliant! I am pretty impressed with the functionality it provides, the ease of setup and the blueprinting functionality it provides. The project is only in a public beta state right now, but I plan to keep my eyes on the project and see how it develops. If you are interested in reading more about my experience with the very first cluster and some screenshots of the interface, keep reading!

Cloudbreak UI & Cluster Deployment

Upon deployment of the cluster, the Dashboard UI immediately began showing me status updates on the ribbon and a small square widget. When the widget was clicked, it expanded to show further details of the cluster and the status.

sequenceiq cloudbreak



I logged into Ambari after the cluster reported that it was deployed and Ambari was accessible. The Cloudbreak dashboard had a link sending me to the Ambari UI. Once logged in, I was pleased to see some basic status items displayed, along with a list of services available in the cluster.

Note: The Ambari login credentials for the Ambari UI are the defaults (admin/admin). The Cloudbreak UI did not tell me this anywhere that I saw, so I had to Google what the defaults were.

cloudbreak-screen-03Poking around the Ambari interface, it is pretty easy to see what nodes the Cloudbreak service had deployed inside AWS and their relevant information.


Noticing the metric widgets were reporting they were unable to report/collect data because the Ganglia service was not deployed, I started through the process of adding services to the running cluster. The Ambari interface went through  a multi-step process, allowing me to select the services I wanted and checking prerequisites for those services. In my case, the cluster also needed the Tez service in order to deploy the services I select.




Here is where things went a little sideways — the deployment of the new services stalled at 2/10.


After about 30 minutes and several attempts to refresh the page, I closed the page and went back into Ambari through the Cloudbreak dashboard. When Ambari loaded, I was presented with the following screen and no way to interact with Ambari — it appeared wedged.


This allowed me to test the termination functionality of Cloudbreak and seeing how it removes all of the objects from my AWS account.



All-in-all a great first experience with Cloudbreak. I am going to continue to play with it — especially since I am interested in how they are using Docker containers in the service.


Upgrading Apache Mesos & Marathon on VMware Big Data Extensions


There are a couple new versions for both Apache Mesos and Mesosphere Marathon. Since one of the great things about using VMware Big Data Extensions to handle the deployment and management of clusters is the ability to ensure control of the packages and versions within the SDLC. There were a few changes to be made within Chef to support Mesos v0.22.0 and Marathon 0.8.0* within the Big Data Extensions management server.

First thing, you need to grab the new RPM files and place them into your management server under /opt/serengeti/www/yum/repos/mesos/0.21.0 directory. For my own installation, I created a new directory called mesos/0.22.0 and copied the packages that had not been updated into it — but that decision is up to you. Once the files are there, you can choose whether to increment the version number in /opt/serengeti/www/distros/manifest file:

44   {
45     "name": "mesos",
46     "vendor": "MESOS",
47     "version": "0.22.0",
48     "packages": [
49       {
50         "package_repos": [
51           "https://bde.localdomain/yum/mesos.repo"
52         ],
53         "roles": [
54           "zookeeper",
55           "mesos_master",
56           "mesos_slave",
57           "mesos_docker",
58           "mesos_chronos",
59           "mesos_marathon"
60         ]
61       }
62     ]
63   },

If you choose to update the manifest file, make sure you restart tomcat on the BDE management server. You normally would be done at this point, but there were changes in where Mesos and Marathon install two files that we’ll need to account for. The old files were in subdirectories under /usr/local and are now placed in subdirectories of /usr.


  9 exec /usr/bin/marathon


 78 template '/usr/etc/mesos/' do
 79   source ''
 80   variables(
 81     zookeeper_server_list: zk_server_list,
 82     zookeeper_port: zk_port,
 83     zookeeper_path: zk_path,
 84     logs_dir: node['mesos']['common']['logs'],
 85     work_dir: node['mesos']['slave']['work_dir'],
 86     isolation: node['mesos']['isolation'],
 87   )
 88   notifies :run, 'bash[restart-mesos-slave]', :delayed
 89 end

Update the files in the Chef server by running ‘knife cookbook upload -a‘ and you are all set.

Go forth and deploy your updated Apache Mesos 0.22.0 and Marathon 0.8.0 clusters!

*Marathon 0.8.1 has been released but there has not been a RPM file created by Mesosphere yet.

HAProxy support for Mesos in vSphere Big Data Extensions

I realized late last night the current vSphere Big Data Extensions fling does not have HAProxy built into it for the Mesos cluster deployments. After a bit of reading and testing new pieces inside the Chef recipes, I have added support so that HAProxy is running on all of the Mesos nodes. The first thing is to add the HAProxy package to the /opt/serengeti/chef/cookbooks/mesos/recipes/install.rb file:

 72   %w( unzip libcurl haproxy ).each do |pkg|
 73     yum_package pkg do
 74       action :install
 75     end
 76   end

There is also a script that Mesosphere provides to modify the HAProxy configuration file and reload the rules when changes occur. You can find instructions on the file and how to incorporate it on the Mesosphere page.

Note: I had to edit ‘sudo’ out of the lines inside the script in order for Chef to execute it properly.

After copying the file haproxy-marathon-bridge into my Chef server, I added the following code to the same install.rb file to get things all setup and configured properly:

 82   directory "/etc/haproxy-marathon-bridge" do
 83     owner 'root'
 84     group 'root'
 85     mode '0755'
 86     action :create
 87   end
 89   template '/usr/local/bin/haproxy-marathon-bridge' do
 90     source 'haproxy-marathon-bridge.erb'
 91     action :create
 92   end
 94   master_ips = mesos_masters_ip
 95   slave_ips = mesos_slaves_ip
 97   all_ips = master_ips
 98   all_ips += slave_ips
100   template '/etc/haproxy-marathon-bridge/marathons' do
101     source 'marathons.erb'
102     variables(
103       haproxy_server_list: all_ips
104     )
105     action :create
106   end
108   execute 'configure haproxy' do
109     command 'chkconfig haproxy on; service haproxy start'
110   end
112   execute 'setup haproxy-marathon-bridge' do
113     command 'chmod 755 /usr/local/bin/haproxy-marathon-bridge; /usr/local/bin/haproxy-marathon-bridge install_cronjob'
114   end

There is also a bit of supporting code needed for lines 94-98 above that were added to /opt/serengeti/chef/cookbooks/mesos/libraries/default.rb:

  1 module Mesosphere
  3   def mesos_masters_ip
  4     servers = all_providers_fqdn_for_role("mesos_master")
  5"Mesos master nodes in cluster #{node[:cluster_name]} are: #{servers.inspect}")
  6     servers
  7   end
  9   def mesos_slaves_ip
 10     servers = all_providers_fqdn_for_role("mesos_slave")
 11"Mesos slave nodes in cluster #{node[:cluster_name]} are: #{servers.inspect}")
 12     servers
 13   end
 15 end
 17 class Chef::Recipe; include Mesosphere; end

The last thing needed is a new template file for the /etc/haproxy-marathon-bridge/marathons file that is needed by the script provided by Mesosphere. I created the file /opt/serengeti/chef/cookbooks/mesos/templates/default/marathons.erb:

  1 # Configuration file for haproxy-marathon-bridge script
  2 <%
  3   ha_url_list = []
  4   @haproxy_server_list.each do |ha_server|
  5     ha_url_list << "#{ha_server}"
  6   end
  7 %>
  8 <%= ha_url_list.join(":8080\n") + ":8080" %>

At this point, all of the modifications can be uploaded to the Chef server with the command knife cookbook upload -a and a new cluster can be deployed with HAProxy support.

After deploying a nginx workload, you scale it out and check the /etc/haproxy/haproxy.cfg file on a master node and see entries like:

[root@hadoopvm388 haproxy]# cat haproxy.cfg global
  log local0
  log local1 notice
  maxconn 4096
  log            global
  retries             3
  maxconn          2000
  timeout connect  5000
  timeout client  50000
  timeout server  50000
listen stats
  mode http
  stats enable
  stats auth admin:admin
listen nginx-80
  mode tcp
  option tcplog
  balance leastconn
  server nginx-10 hadoopvm382.localdomain:31000 check
  server nginx-9 hadoopvm390.localdomain:31000 check
  server nginx-8 hadoopvm387.localdomain:31000 check
  server nginx-7 hadoopvm389.localdomain:31000 check
  server nginx-6 hadoopvm386.localdomain:31000 check
  server nginx-5 hadoopvm383.localdomain:31000 check
  server nginx-4 hadoopvm378.localdomain:31001 check
  server nginx-3 hadoopvm381.localdomain:31000 check
  server nginx-2 hadoopvm385.localdomain:31000 check
  server nginx-1 hadoopvm378.localdomain:31000 check