VMware Photon TP2 was released on August 27. The new version contains native support for running Mesos and therefore should have allowed the Photon OS to run as a Mesos slave immediately after installation. I would like to think my earlier blog post detailing how to deploy Mesos on-top of Photon influenced this functionality.
After conversations with people involved in the project, the idea is for Photon to act only as a Mesos slave, with external Mesos masters and Zookeeper running on an Ubuntu/CentOS/Red Hat nodes. Logically the architecture of a Mesos cluster with Photon OS would look like the following.
In order to deploy the cluster in this fashion, I wanted to find a method for automating as much of it as possible. Currently, one limitation with VMware Big Data Extensions is the single template VM limit. How awesome would it be if you could have multiple template VMs within the vApp and choose which template to deploy based on a pre-defined role? Definitely something to look into.
Regardless, working within the current limitations of BDE, I will describe in detail how I am now deploying Photon OS nodes into a Mesos cluster as automated as possible.
Configuring Big Data Extensions
I decided to create a new cluster map for a Mesos cluster that only deployed the Zookeeper and Mesos master nodes. The idea is similar to a Compute-only Hadoop or HDFS-only Hadoop cluster deployment through BDE. All that is required to accomplish this is a JSON file with the new cluster definition and an entry in the /opt/serengeti/www/specs/map file.
/opt/serengeti/www/specs/Ironfan/mesos/master/spec.json
1 { 2 "nodeGroups":[ 3 { 4 "name": "Zookeeper", 5 "roles": [ 6 "zookeeper" 7 ], 8 "groupType": "zookeeper", 9 "instanceNum": "[3,3,3]", 10 "instanceType": "[SMALL]", 11 "cpuNum": "[1,1,64]", 12 "memCapacityMB": "[7500,3748,min]", 13 "storage": { 14 "type": "[SHARED,LOCAL]", 15 "sizeGB": "[2,2,min]" 16 }, 17 "haFlag": "on" 18 }, 19 { 20 "name": "Master", 21 "description": "The Mesos master node", 22 "roles": [ 23 "mesos_master", 24 "mesos_chronos", 25 "mesos_marathon" 26 ], 27 "groupType": "master", 28 "instanceNum": "[2,1,2]", 29 "instanceType": "[MEDIUM,SMALL,LARGE,EXTRA_LARGE]", 30 "cpuNum": "[1,1,64]", 31 "memCapacityMB": "[7500,3748,max]", 32 "storage": { 33 "type": "[SHARED,LOCAL]", 34 "sizeGB": "[1,1,min]" 35 }, 36 "haFlag": "on" 37 } 38 ] 39 }
/opt/serengeti/www/specs/map
17 "vendor" : "Mesos", 18 "version" : "^(\\d)+(\\.\\w+)*", 19 "type" : "Mesos Master-Only Cluster", 20 "appManager" : "Default", 21 "path" : "Ironfan/mesos/master/spec.json" 22 },
Normally, editing the two files would have been all that was required, however I have modified the Chef cookbooks to include the HAProxy package. I had included it in the install.rb cookbook for Mesos and this causes a problem if there are no slave nodes. I moved the code to the master.rb cookbook and updated the Chef server.
/opt/serengeti/chef/cookbooks/mesos/recipes/master.rb
166 directory "/etc/haproxy-marathon-bridge" do 167 owner 'root' 168 group 'root' 169 mode '0755' 170 action :create 171 end 172 173 template '/usr/local/bin/haproxy-marathon-bridge' do 174 source 'haproxy-marathon-bridge.erb' 175 action :create 176 end 177 178 all_ips = mesos_masters_ip 179 180 template '/etc/haproxy-marathon-bridge/marathons' do 181 source 'marathons.erb' 182 variables( 183 haproxy_server_list: all_ips 184 ) 185 action :create 186 end 187 188 execute 'configure haproxy' do 189 command 'chkconfig haproxy on; service haproxy start' 190 end 191 192 execute 'setup haproxy-marathon-bridge' do 193 command 'chmod 755 /usr/local/bin/haproxy-marathon-bridge; /usr/local/bin/haproxy-marathon-bridge install_cronjob' 194 end 195 196 template '/usr/local/bin/haproxy-marathon-bridge' do 197 source 'haproxy-marathon-bridge.erb' 198 action :create 199 end
Restart Tomcat on the management server and then the new cluster definition is available for use.
My new cluster, minus the slave nodes looks like this now.
Using the new deployment option to deploy the Apache Mesos cluster. Once the cluster is configured and available, note the IP addresses of the two Mesos master nodes. We are going to use those IP addresses within the Photon nodes to pre-populate configuration files so the Photon nodes automatically join the cluster.
Photon Node Configuration
The next step is to configure a Photon node template that will automatically join the Mesos cluster deployed previously. After installing a node with the new TP2 release of Photon, I enabled root login over SSH so that I could quickly configure the node — be sure to turn it back off after you perform the following tasks.
Unfortunately, the version of Mesos that shipped in the ISO file released is 0.22.0 and there is a known conflict with the newer versions of Docker. The Photon TP2 ISO included Docker version 1.8.1 and it threw the following error when I tried to start the node as a Mesos slave:
root [ /etc/systemd/system ]# /usr/sbin/mesos-slave --master=zk://192.168.1.126:2181,192.168.1.127:2181,192.168.1.128:2181/mesos_cell --hostname=$(/usr/bin/hostname) --log_dir=/var/log/mesos_slave --containerizers=docker,mesos --docker=/usr/bin/docker --executor_registration_timeout=5mins --ip=$(/usr/sbin/ip -o -4 addr list | grep eno | grep global | awk 'NR==1{print $4}' | cut -d/ -f1) I0905 18:42:16.588754 4269 logging.cpp:172] INFO level logging started! I0905 18:42:16.591898 4269 main.cpp:156] Build: 2015-08-20 20:33:22 by I0905 18:42:16.592162 4269 main.cpp:158] Version: 0.22.1 Failed to create a containerizer: Could not create DockerContainerizer: Insufficient version of Docker! Please upgrade to >= 1.0.0
The bug was already noted in the updated code on the Photon GitHub repo, however there is not an update ISO available. That meant I needed to build my own ISO file from the latest code on the repo.
Note: Make sure the Ubuntu node has plenty of CPU and memory for compiling the ISO image. I was using a 1vCPU and 1GB memory VM in my lab and it took a long time to build the ISO.
After successfully building an updated ISO image, I used it to build a new VM. I really enjoy how quickly the Photon OS builds, even in my limited home lab environment.
I wanted to configure the mesos-slave service to start each time the VM is booted and automatically join the master-only Mesos cluster I deployed above using BDE. That meant I needed to configure the mesos-slave.service file on the Photon node.
/etc/systemd/system/mesos-slave.service
1 [Unit] 2 Description=Photon Mesos Slave node 3 After=network.target,docker.service 4 5 [Service] 6 Restart=on-failure 7 RestartSec=10 8 TimeoutStartSec=0 9 ExecStartPre=/usr/bin/rm -f /tmp/mesos/meta/slaves/latest 10 ExecStart=/bin/bash -c "/usr/sbin/mesos-slave \ 11 --master=zk://192.168.1.126:2181,192.168.1.127:2181,192.168.1.128:2181/mesos_cell \ 12 --hostname=$(/usr/bin/hostname) \ 13 --log_dir=/var/log/mesos_slave \ 14 --containerizers=docker,mesos \ 15 --docker=/usr/bin/docker \ 16 --executor_registration_timeout=5mins \ 17 --ip=$(/usr/sbin/ip -o -4 addr list | grep eno | grep global | awk 'NR==1{print $4}' | cut -d/ -f1)" 18 19 [Install] 20 WantedBy=multi-user.target
After creating the service file for systemd, it was then possible to start service and see it join the Mesos cluster in the UI.
I shutdown the VM and cloned it to a template for use with the next step.
Final step is now to run a workload on the cluster, with Photon providing the Docker containers.
Workload Deployment
Launching a container workload on the new cluster was rather straightforward. I used a simple NGiNX container and exposed it over port 80.
A few things, like automatic hostname configuration within Photon based on the DHCP address, are still left to do. But this is a working solution and let’s me do some next-level deployment testing using Photon as the mechanism for deploying the Docker containers.
If you have any questions on what I did here, feel free to reach out to me over Twitter.