VMware Photon TP2 + Mesos Cluster Deployment

photon-dfad9617

VMware Photon TP2 was released on August 27. The new version contains native support for running Mesos and therefore should have allowed the Photon OS to run as a Mesos slave immediately after installation. I would like to think my earlier blog post detailing how to deploy Mesos on-top of Photon influenced this functionality.

Download the ISO here.

After conversations with people involved in the project, the idea is for Photon to act only as a Mesos slave, with external Mesos masters and Zookeeper running on an Ubuntu/CentOS/Red Hat nodes. Logically the architecture of a Mesos cluster with Photon OS would look like the following.

 

mesos-photon-cluster

 

In order to deploy the cluster in this fashion, I wanted to find a method for automating as much of it as possible. Currently, one limitation with VMware Big Data Extensions is the single template VM limit. How awesome would it be if you could have multiple template VMs within the vApp and choose which template to deploy based on a pre-defined role? Definitely something to look into.

Regardless, working within the current limitations of BDE, I will describe in detail how I am now deploying Photon OS nodes into a Mesos cluster as automated as possible.

Configuring Big Data Extensions

I decided to create a new cluster map for a Mesos cluster that only deployed the Zookeeper and Mesos master nodes. The idea is similar to a Compute-only Hadoop or HDFS-only Hadoop cluster deployment through BDE. All that is required to accomplish this is a JSON file with the new cluster definition and an entry in the /opt/serengeti/www/specs/map file.

/opt/serengeti/www/specs/Ironfan/mesos/master/spec.json

  1 {
  2   "nodeGroups":[
  3     {
  4       "name": "Zookeeper",
  5       "roles": [
  6         "zookeeper"
  7       ],
  8       "groupType": "zookeeper",
  9       "instanceNum": "[3,3,3]",
 10       "instanceType": "[SMALL]",
 11       "cpuNum": "[1,1,64]",
 12       "memCapacityMB": "[7500,3748,min]",
 13       "storage": {
 14         "type": "[SHARED,LOCAL]",
 15         "sizeGB": "[2,2,min]"
 16       },
 17       "haFlag": "on"
 18     },
 19     {
 20       "name": "Master",
 21       "description": "The Mesos master node",
 22       "roles": [
 23         "mesos_master",
 24         "mesos_chronos",
 25         "mesos_marathon"
 26       ],
 27       "groupType": "master",
 28       "instanceNum": "[2,1,2]",
 29       "instanceType": "[MEDIUM,SMALL,LARGE,EXTRA_LARGE]",
 30       "cpuNum": "[1,1,64]",
 31       "memCapacityMB": "[7500,3748,max]",
 32       "storage": {
 33         "type": "[SHARED,LOCAL]",
 34         "sizeGB": "[1,1,min]"
 35       },
 36       "haFlag": "on"
 37     }
 38   ]
 39 }

/opt/serengeti/www/specs/map

 17     "vendor" : "Mesos",
 18     "version" : "^(\\d)+(\\.\\w+)*",
 19     "type" : "Mesos Master-Only Cluster",
 20     "appManager" : "Default",
 21     "path" : "Ironfan/mesos/master/spec.json"
 22   },

Normally, editing the two files would have been all that was required, however I have modified the Chef cookbooks to include the HAProxy package. I had included it in the install.rb cookbook for Mesos and this causes a problem if there are no slave nodes. I moved the code to the master.rb cookbook and updated the Chef server.

/opt/serengeti/chef/cookbooks/mesos/recipes/master.rb

166 directory "/etc/haproxy-marathon-bridge" do
167   owner 'root'
168   group 'root'
169   mode '0755'
170   action :create
171 end
172 
173 template '/usr/local/bin/haproxy-marathon-bridge' do
174   source 'haproxy-marathon-bridge.erb'
175   action :create
176 end
177 
178 all_ips = mesos_masters_ip
179 
180 template '/etc/haproxy-marathon-bridge/marathons' do
181   source 'marathons.erb'
182   variables(
183     haproxy_server_list: all_ips
184   )
185   action :create
186 end
187 
188 execute 'configure haproxy' do
189   command 'chkconfig haproxy on; service haproxy start'
190 end
191 
192 execute 'setup haproxy-marathon-bridge' do
193   command 'chmod 755 /usr/local/bin/haproxy-marathon-bridge; /usr/local/bin/haproxy-marathon-bridge install_cronjob'
194 end
195 
196 template '/usr/local/bin/haproxy-marathon-bridge' do
197   source 'haproxy-marathon-bridge.erb'
198   action :create
199 end

Restart Tomcat on the management server and then the new cluster definition is available for use.

My new cluster, minus the slave nodes looks like this now.

mesos-no-slaves

Using the new deployment option to deploy the Apache Mesos cluster. Once the cluster is configured and available, note the IP addresses of the two Mesos master nodes. We are going to use those IP addresses within the Photon nodes to pre-populate configuration files so the Photon nodes automatically join the cluster.

Photon Node Configuration

The next step is to configure a Photon node template that will automatically join the Mesos cluster deployed previously. After installing a node with the new TP2 release of Photon, I enabled root login over SSH so that I could quickly configure the node — be sure to turn it back off after you perform the following tasks.

Unfortunately, the version of Mesos that shipped in the ISO file released is 0.22.0 and there is a known conflict with the newer versions of Docker. The Photon TP2 ISO included Docker version 1.8.1 and it threw the following error when I tried to start the node as a Mesos slave:

root [ /etc/systemd/system ]# /usr/sbin/mesos-slave --master=zk://192.168.1.126:2181,192.168.1.127:2181,192.168.1.128:2181/mesos_cell --hostname=$(/usr/bin/hostname) --log_dir=/var/log/mesos_slave --containerizers=docker,mesos --docker=/usr/bin/docker --executor_registration_timeout=5mins --ip=$(/usr/sbin/ip -o -4 addr list | grep eno | grep global | awk 'NR==1{print $4}' | cut -d/ -f1)
I0905 18:42:16.588754  4269 logging.cpp:172] INFO level logging started!
I0905 18:42:16.591898  4269 main.cpp:156] Build: 2015-08-20 20:33:22 by 
I0905 18:42:16.592162  4269 main.cpp:158] Version: 0.22.1
Failed to create a containerizer: Could not create DockerContainerizer: Insufficient version of Docker! Please upgrade to >= 1.0.0

The bug was already noted in the updated code on the Photon GitHub repo, however there is not an update ISO available. That meant I needed to build my own ISO file from the latest code on the repo.

Note: Make sure the Ubuntu node has plenty of CPU and memory for compiling the ISO image. I was using a 1vCPU and 1GB memory VM in my lab and it took a long time to build the ISO.

photon-iso

After successfully building an updated ISO image, I used it to build a new VM. I really enjoy how quickly the Photon OS builds, even in my limited home lab environment.

photon-build-time

I wanted to configure the mesos-slave service to start each time the VM is booted and automatically join the master-only Mesos cluster I deployed above using BDE. That meant I needed to configure the mesos-slave.service file on the Photon node.

/etc/systemd/system/mesos-slave.service

  1 [Unit]
  2 Description=Photon Mesos Slave node
  3 After=network.target,docker.service
  4 
  5 [Service]
  6 Restart=on-failure
  7 RestartSec=10
  8 TimeoutStartSec=0
  9 ExecStartPre=/usr/bin/rm -f /tmp/mesos/meta/slaves/latest
 10 ExecStart=/bin/bash -c "/usr/sbin/mesos-slave \
 11 --master=zk://192.168.1.126:2181,192.168.1.127:2181,192.168.1.128:2181/mesos_cell \
 12 --hostname=$(/usr/bin/hostname) \
 13 --log_dir=/var/log/mesos_slave \
 14 --containerizers=docker,mesos \
 15 --docker=/usr/bin/docker \
 16 --executor_registration_timeout=5mins \
 17 --ip=$(/usr/sbin/ip -o -4 addr list | grep eno | grep global | awk 'NR==1{print $4}' | cut -d/ -f1)"
 18 
 19 [Install]
 20 WantedBy=multi-user.target

After creating the service file for systemd, it was then possible to start service and see it join the Mesos cluster in the UI.

meson-running

mesos-cluster-1

I shutdown the VM and cloned it to a template for use with the next step.

Final step is now to run a workload on the cluster, with Photon providing the Docker containers.

Workload Deployment

Launching a container workload on the new cluster was rather straightforward. I used a simple NGiNX container and exposed it over port 80.

meson-running-workload

marathon-running-workload

 

A few things, like automatic hostname configuration within Photon based on the DHCP address, are still left to do. But this is a working solution and let’s me do some next-level deployment testing using Photon as the mechanism for deploying the Docker containers.

If you have any questions on what I did here, feel free to reach out to me over Twitter.

Apache Mesos and Marathon Deployment Demo at VMworld

VMworld 2015

Andrew Nelson and Tom Twyman  spoke on Wednesday morning about Apache Mesos and Marathon at VMworld. During their session they showed a demo of a cluster deployment — although they experienced a couple technical difficulties. The session covered the overall basics of how to operationalize Cloud Native Applications using Apache Mesos, Mesosphere Marathon and Docker on a VMware private cloud.

Here is an alternate cut of the demo they showed yesterday.

 

The video walks a user through a deployment of a Apache Mesos cluster using VMware Big Data Extensions, shows the running UI for Apache Mesos, Mesosphere Marathon and Chronos. Behind the scenes, HAProxy has been installed to automatically add any workloads launched and Docker support on each node. After the deployment is complete, a NGiNX Docker workload is launched into Marathon using the API. The workload is scaled from 1 to 6 instances and shows the HAProxy ruleset being updated to include each instance that is running. Finally, the video shows the Apache Mesos cluster itself being scaled while the same NGiNX workload is still running.

A quick 3-minute video showing how versatile Cloud Native Apps on top of VMware infrastructure can be to enable developers to take advantages of the newest technologies for running containers.

Exposing Apache Mesos on VMware Big Data Extensions v2.2

The VMware Big Data Extensions v2.2 release included the cookbooks for Apache Mesos and Kubernetes from the Fling released this past spring. However, those cookbooks are not exposed when you deploy the new version. Fortunately, unlocking them only takes a few minutes before they can be made available! I will cover exactly what is needed in order to begin using these Cloud Native App cluster deployments below.

If you jump onto your v2.2 management server and look in the /opt/serengeti/chef/cookbooks directory, you will see all of the Cloud Native App additions.

BDE-Cookbooks-v2.2

A quick look to be sure the Chef roles are still defined tells us that they are.

Chef-BDE-v2.2-Roles

 

They even did us the favor of including the JSON spec files in the /opt/serengeti/www/specs/Ironfan directory.

BDE-v2.2-Mesos-JSON

The missing pieces are the entries in the /opt/serengeti/www/specs/map and /opt/serengeti/www/distros/manifest files. Those are rather easy to copy out of the VMware Fling itself or re-create manually. If you want to edit the files yourself, here is what needs to be added to the files.

/opt/serengeti/www/specs/map

{
  "vendor" : "Kubernetes",
  "version" : "^(\\d)+(\\.\\w+)*",
  "type" : "Basic Kubernetes Cluster",
  "appManager" : "Default",
  "path" : "Ironfan/kubernetes/basic/spec.json"
},
{
  "vendor" : "Mesos",
  "version" : "^(\\d)+(\\.\\w+)*",
  "type" : "Basic Mesos Cluster",
  "appManager" : "Default",
  "path" : "Ironfan/mesos/basic/spec.json"
},

/opt/serengeti/www/distros/manifest

{
  "name" : "kubernetes",
  "vendor" : "KUBERNETES",
  "version" : "0.5.4",
  "packages" : [
    {
      "tarball": "kubernetes/kubernetes-0.5.4.tar.gz",
      "roles": [
        "kubernetes_workstation",
        "kubernetes_master",
        "kubernetes_minion"
      ]
    }
  ]
},
{
  "name" : "mesos",
  "vendor" : "MESOS",
  "version" : "0.21.0",
  "packages" : [
    {
      "package_repos": [ "https://0.0.0.0/yum/mesos.repo"],
      "roles" : [
        "zookeeper",
        "mesos_master",
        "mesos_slave",
        "mesos_docker",
        "mesos_chronos",
        "mesos_marathon"
      ]
    }
  ]
}

The repos built into the Fling are not present (unfortunately) on the management server. This was the only tedious portion of the entire process. The easiest method is to grab the files out of an existing BDE Fling management server and copy them into the new one. The other option is find the latest RPMs on the Internet and add them to the management server manually. In either case, you’ll need to run the CentOS syntax for creating the repository.

Create local repo for Apache Mesos

# su - serengeti
$ cd /opt/serengeti/www/yum
$ vim mesos.repo
[a-mesos]
name=Apache Mesos
baseurl=https://0.0.0.0/yum/repos/mesos/current/
enabled=1
gpgcheck=0
sslverify=1
sslcacert=/etc/chef/trusted_certs/serengeti-base.pem

$ mkdir -p repos/mesos/current/RPMS
$ cd repos/mesos/current

The Fling included the following files:
- bigtop-utils-0.8.0.4-1.el6.noarch.rpm
- chronos-2.3.0-0.1.20141121000021.x86_64.rpm
- docker-io-1.3.1-2.el6.x86_64.rpm
- marathon-0.7.5-1.0.x86_64.rpm
- mesos-0.21.0-1.0.centos65.x86_64.rpm
- subversion-1.6.11-10.el6_5.x86_64.rpm
- zookeeper-3.4.5.4-1.el6.noarch.rpm
- zookeeper-server-3.4.5.4-1.el6.noarch.rpm

$ createrepo .

A restart of Tomcat is all that is needed and then you will be able to start deploying Apache Mesos and Kubernetes clusters through BDE v2.2.

If you want to take advantage of the Instant Clone functionality, you will need to be running vSphere 6.0 and BDE v2.2. There are also a couple adjustments to the /opt/serengeti/conf/serengeti.properties files that will be need to be made. I will be going over those in a future post discussing how to use the Photon OS as the template for BDE to deploy.

Installing Apache Mesos on Photon Linux OS

Ever since the VMware Photon Technical Preview was made available, I have wanted to install Apache Mesos on a node. The Photon OS is a very minimal Linux installation and so my initial attempts to work through the process led me down the old-fashioned rabbit hole of manually compiling packages. It was very reminiscent of the later 1990’s and using Debian installed via floppy disk. I finally found the time to work my way down and back out of the rabbit hole and have been able to successfully get Mesos to install and run on Photon! This is a good first step towards building a Photon template to be used inside Big Data Extensions for deploying Cloud Native Apps with Mesos|Marathon|Chronos or Kubernetes.

The lab environment I used is running vSphere 5.5 and consists of a small set of nested ESXi hypervisors. I am not going to cover install Photon on a VM, but just be sure you have one that was installed with the full OS — not the minimal installation. After the Photon VM is configured to communicate with the Internet, you can follow these instructions to get Apache Mesos installed. The end result of the guide will be a working Mesos node running on Photon that can launch Docker containers.

Apache Mesos on Photon

Be sure to follow it in the order listed as the ordering of the packages is important. Also have a Photon VM with at least 3GB of Memory allocated to it for the compile processes.

HOWTO Guide

Missing prerequisites
APR Library
# wget http://apache.claz.org//apr/apr-1.5.2.tar.gz
# tar zxvf apr-1.5.2.tar.gz
# cd apr-1.5.2
# ./configure —prefix=/usr/local/lib/apr
# make
# make test
# make install

APR-UTIL Library
# wget http://apache.claz.org//apr/apr-util-1.5.4.tar.gz
# tar zxvf apr-util-1.5.4.tar.gz
# cd apr-util-1.5.4
# ./configure —prefix=/usr/local/lib/apr —with-apr=/usr/local/lib/apr
# make
# make install

Subversion
# wget http://apache.osuosl.org/subversion/subversion-1.8.13.tar.gz
# tar zxvf subversion-1.8.13.tar.gz
# cd subversion-1.8.13
# ./configure —prefix=/usr/local/lib/subversion —with-apr=/usr/local/lib/apr —with-apr-util=/usr/local/lib/apr
# make
# make install

OpenJDK Java
Download the Java JDK source tarball from the Oracle website (http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz)
# tar zxvf jdk-7u79-linux-x64.tar.gz
# mv jdk1.7.0_79 /usr/local/java
# echo JAVA_HOME=/usr/local/java >> /etc/environment
# source /etc/environment

Apache Maven Library
# wget http://apache.mirrors.ionfish.org//ant/binaries/apache-ant-1.9.5-bin.tar.gz
# tar zxvf apache-ant-1.9.5-bin.tar.gz
# mv apache-ant-1.9.5 /usr/local
# ln -s /usr/local/apache-ant-1.9.5 /usr/local/apache-ant
# wget http://apache.cs.utah.edu/maven/maven-3/3.3.3/source/apache-maven-3.3.3-src.tar.gz
# tar zxvf apache-maven-3.3.3-src.tar.gz
# cd apache-maven-3.3.3
# /usr/local/apache-ant/bin/ant -Dmaven.home=“/usr/local/maven-3.3.3"
# echo MAVEN_HOME=/usr/local/maven-3.3.3 >> /etc/environment
# export /etc/environment

Install Apache Mesos
# wget http://www.apache.org/dist/mesos/0.22.1/mesos-0.22.1.tar.gz
# tar zxvf mesos-0.22.1.tar.gz
# cd mesos-0.22.1
# mkdir build
# cd build
# ../configure --prefix=/usr/local/mesos —with-apr=/usr/local/lib/apr —with-svn=/usr/local/lib/subversion
# make
# make check
# make install
After Apache Mesos is installed, you can start both the master and slave processes on the node to run a quick test.
# /usr/local/mesos/bin/mesos-master.sh —ip=127.0.0.1 —work_dir=/var/lib/mesos
# /usr/local/mesos/bin/mesos-slave.sh —master=127.0.0.1:5050
Afterwards, open a web browser and point to to http://127.0.0.1:5050 and you will see the Apache Mesos interface. The next step will be to deploy multiple Photon nodes and configure them to be a part of a single cluster. I will likely orchestrate all of this through Chef next and incorporate it into the Big Data Extensions framework.
There is always more to do, but I am glad to have gotten this working and sharing it with everyone. Reach out to me on Twitter if you have questions.

SequenceIQ Cloudbreak Hadoop Cluster Deployments

I saw activity on Twitter talking about Cloudbreak from SequenceIQ as a method for deploying Hadoop clusters into several public cloud providers. My interest was peaked and I started reading through the Cloudbreak documentation on Saturday night. About 20 minutes later, I had an account setup, tied it to my AWS account and started deploying a cluster. Approximately 15 minutes later, the cluster was deployed and Ambari was setup running and I could log into the UI.

The interface is truly brilliant! I am pretty impressed with the functionality it provides, the ease of setup and the blueprinting functionality it provides. The project is only in a public beta state right now, but I plan to keep my eyes on the project and see how it develops. If you are interested in reading more about my experience with the very first cluster and some screenshots of the interface, keep reading!

Cloudbreak UI & Cluster Deployment

Upon deployment of the cluster, the Dashboard UI immediately began showing me status updates on the ribbon and a small square widget. When the widget was clicked, it expanded to show further details of the cluster and the status.

sequenceiq cloudbreak

cloudbreak-screen-02

 

I logged into Ambari after the cluster reported that it was deployed and Ambari was accessible. The Cloudbreak dashboard had a link sending me to the Ambari UI. Once logged in, I was pleased to see some basic status items displayed, along with a list of services available in the cluster.

Note: The Ambari login credentials for the Ambari UI are the defaults (admin/admin). The Cloudbreak UI did not tell me this anywhere that I saw, so I had to Google what the defaults were.

cloudbreak-screen-03Poking around the Ambari interface, it is pretty easy to see what nodes the Cloudbreak service had deployed inside AWS and their relevant information.

cloudbreak-screen-04

Noticing the metric widgets were reporting they were unable to report/collect data because the Ganglia service was not deployed, I started through the process of adding services to the running cluster. The Ambari interface went through  a multi-step process, allowing me to select the services I wanted and checking prerequisites for those services. In my case, the cluster also needed the Tez service in order to deploy the services I select.

cloudbreak-screen-05

cloudbreak-screen-06

cloudbreak-screen-07

Here is where things went a little sideways — the deployment of the new services stalled at 2/10.

cloudbreak-screen-08

After about 30 minutes and several attempts to refresh the page, I closed the page and went back into Ambari through the Cloudbreak dashboard. When Ambari loaded, I was presented with the following screen and no way to interact with Ambari — it appeared wedged.

cloudbreak-screen-09

This allowed me to test the termination functionality of Cloudbreak and seeing how it removes all of the objects from my AWS account.

cloudbreak-screen-10

cloudbreak-screen-11

All-in-all a great first experience with Cloudbreak. I am going to continue to play with it — especially since I am interested in how they are using Docker containers in the service.