VMware Photon TP2 + Mesos Cluster Deployment

photon-dfad9617

VMware Photon TP2 was released on August 27. The new version contains native support for running Mesos and therefore should have allowed the Photon OS to run as a Mesos slave immediately after installation. I would like to think my earlier blog post detailing how to deploy Mesos on-top of Photon influenced this functionality.

Download the ISO here.

After conversations with people involved in the project, the idea is for Photon to act only as a Mesos slave, with external Mesos masters and Zookeeper running on an Ubuntu/CentOS/Red Hat nodes. Logically the architecture of a Mesos cluster with Photon OS would look like the following.

 

mesos-photon-cluster

 

In order to deploy the cluster in this fashion, I wanted to find a method for automating as much of it as possible. Currently, one limitation with VMware Big Data Extensions is the single template VM limit. How awesome would it be if you could have multiple template VMs within the vApp and choose which template to deploy based on a pre-defined role? Definitely something to look into.

Regardless, working within the current limitations of BDE, I will describe in detail how I am now deploying Photon OS nodes into a Mesos cluster as automated as possible.

Configuring Big Data Extensions

I decided to create a new cluster map for a Mesos cluster that only deployed the Zookeeper and Mesos master nodes. The idea is similar to a Compute-only Hadoop or HDFS-only Hadoop cluster deployment through BDE. All that is required to accomplish this is a JSON file with the new cluster definition and an entry in the /opt/serengeti/www/specs/map file.

/opt/serengeti/www/specs/Ironfan/mesos/master/spec.json

  1 {
  2   "nodeGroups":[
  3     {
  4       "name": "Zookeeper",
  5       "roles": [
  6         "zookeeper"
  7       ],
  8       "groupType": "zookeeper",
  9       "instanceNum": "[3,3,3]",
 10       "instanceType": "[SMALL]",
 11       "cpuNum": "[1,1,64]",
 12       "memCapacityMB": "[7500,3748,min]",
 13       "storage": {
 14         "type": "[SHARED,LOCAL]",
 15         "sizeGB": "[2,2,min]"
 16       },
 17       "haFlag": "on"
 18     },
 19     {
 20       "name": "Master",
 21       "description": "The Mesos master node",
 22       "roles": [
 23         "mesos_master",
 24         "mesos_chronos",
 25         "mesos_marathon"
 26       ],
 27       "groupType": "master",
 28       "instanceNum": "[2,1,2]",
 29       "instanceType": "[MEDIUM,SMALL,LARGE,EXTRA_LARGE]",
 30       "cpuNum": "[1,1,64]",
 31       "memCapacityMB": "[7500,3748,max]",
 32       "storage": {
 33         "type": "[SHARED,LOCAL]",
 34         "sizeGB": "[1,1,min]"
 35       },
 36       "haFlag": "on"
 37     }
 38   ]
 39 }

/opt/serengeti/www/specs/map

 17     "vendor" : "Mesos",
 18     "version" : "^(\\d)+(\\.\\w+)*",
 19     "type" : "Mesos Master-Only Cluster",
 20     "appManager" : "Default",
 21     "path" : "Ironfan/mesos/master/spec.json"
 22   },

Normally, editing the two files would have been all that was required, however I have modified the Chef cookbooks to include the HAProxy package. I had included it in the install.rb cookbook for Mesos and this causes a problem if there are no slave nodes. I moved the code to the master.rb cookbook and updated the Chef server.

/opt/serengeti/chef/cookbooks/mesos/recipes/master.rb

166 directory "/etc/haproxy-marathon-bridge" do
167   owner 'root'
168   group 'root'
169   mode '0755'
170   action :create
171 end
172 
173 template '/usr/local/bin/haproxy-marathon-bridge' do
174   source 'haproxy-marathon-bridge.erb'
175   action :create
176 end
177 
178 all_ips = mesos_masters_ip
179 
180 template '/etc/haproxy-marathon-bridge/marathons' do
181   source 'marathons.erb'
182   variables(
183     haproxy_server_list: all_ips
184   )
185   action :create
186 end
187 
188 execute 'configure haproxy' do
189   command 'chkconfig haproxy on; service haproxy start'
190 end
191 
192 execute 'setup haproxy-marathon-bridge' do
193   command 'chmod 755 /usr/local/bin/haproxy-marathon-bridge; /usr/local/bin/haproxy-marathon-bridge install_cronjob'
194 end
195 
196 template '/usr/local/bin/haproxy-marathon-bridge' do
197   source 'haproxy-marathon-bridge.erb'
198   action :create
199 end

Restart Tomcat on the management server and then the new cluster definition is available for use.

My new cluster, minus the slave nodes looks like this now.

mesos-no-slaves

Using the new deployment option to deploy the Apache Mesos cluster. Once the cluster is configured and available, note the IP addresses of the two Mesos master nodes. We are going to use those IP addresses within the Photon nodes to pre-populate configuration files so the Photon nodes automatically join the cluster.

Photon Node Configuration

The next step is to configure a Photon node template that will automatically join the Mesos cluster deployed previously. After installing a node with the new TP2 release of Photon, I enabled root login over SSH so that I could quickly configure the node — be sure to turn it back off after you perform the following tasks.

Unfortunately, the version of Mesos that shipped in the ISO file released is 0.22.0 and there is a known conflict with the newer versions of Docker. The Photon TP2 ISO included Docker version 1.8.1 and it threw the following error when I tried to start the node as a Mesos slave:

root [ /etc/systemd/system ]# /usr/sbin/mesos-slave --master=zk://192.168.1.126:2181,192.168.1.127:2181,192.168.1.128:2181/mesos_cell --hostname=$(/usr/bin/hostname) --log_dir=/var/log/mesos_slave --containerizers=docker,mesos --docker=/usr/bin/docker --executor_registration_timeout=5mins --ip=$(/usr/sbin/ip -o -4 addr list | grep eno | grep global | awk 'NR==1{print $4}' | cut -d/ -f1)
I0905 18:42:16.588754  4269 logging.cpp:172] INFO level logging started!
I0905 18:42:16.591898  4269 main.cpp:156] Build: 2015-08-20 20:33:22 by 
I0905 18:42:16.592162  4269 main.cpp:158] Version: 0.22.1
Failed to create a containerizer: Could not create DockerContainerizer: Insufficient version of Docker! Please upgrade to >= 1.0.0

The bug was already noted in the updated code on the Photon GitHub repo, however there is not an update ISO available. That meant I needed to build my own ISO file from the latest code on the repo.

Note: Make sure the Ubuntu node has plenty of CPU and memory for compiling the ISO image. I was using a 1vCPU and 1GB memory VM in my lab and it took a long time to build the ISO.

photon-iso

After successfully building an updated ISO image, I used it to build a new VM. I really enjoy how quickly the Photon OS builds, even in my limited home lab environment.

photon-build-time

I wanted to configure the mesos-slave service to start each time the VM is booted and automatically join the master-only Mesos cluster I deployed above using BDE. That meant I needed to configure the mesos-slave.service file on the Photon node.

/etc/systemd/system/mesos-slave.service

  1 [Unit]
  2 Description=Photon Mesos Slave node
  3 After=network.target,docker.service
  4 
  5 [Service]
  6 Restart=on-failure
  7 RestartSec=10
  8 TimeoutStartSec=0
  9 ExecStartPre=/usr/bin/rm -f /tmp/mesos/meta/slaves/latest
 10 ExecStart=/bin/bash -c "/usr/sbin/mesos-slave \
 11 --master=zk://192.168.1.126:2181,192.168.1.127:2181,192.168.1.128:2181/mesos_cell \
 12 --hostname=$(/usr/bin/hostname) \
 13 --log_dir=/var/log/mesos_slave \
 14 --containerizers=docker,mesos \
 15 --docker=/usr/bin/docker \
 16 --executor_registration_timeout=5mins \
 17 --ip=$(/usr/sbin/ip -o -4 addr list | grep eno | grep global | awk 'NR==1{print $4}' | cut -d/ -f1)"
 18 
 19 [Install]
 20 WantedBy=multi-user.target

After creating the service file for systemd, it was then possible to start service and see it join the Mesos cluster in the UI.

meson-running

mesos-cluster-1

I shutdown the VM and cloned it to a template for use with the next step.

Final step is now to run a workload on the cluster, with Photon providing the Docker containers.

Workload Deployment

Launching a container workload on the new cluster was rather straightforward. I used a simple NGiNX container and exposed it over port 80.

meson-running-workload

marathon-running-workload

 

A few things, like automatic hostname configuration within Photon based on the DHCP address, are still left to do. But this is a working solution and let’s me do some next-level deployment testing using Photon as the mechanism for deploying the Docker containers.

If you have any questions on what I did here, feel free to reach out to me over Twitter.