Let me start off by saying that adding Apache Kafka into the framework of VMware vSphere Big Data Extensions (BDE) has been the most challenging of them all. Not from a framework perspective, but from a Chef cookbook and configuration one. There were a few resources for me to rely on for the overall configuration of Kafka, however many of them had contradicting statements within them. It took a good 8 solid hours of testing and re-testing the recipes before I was able to get a working multi-node Kafka cluster online.
All that being said, it was important for me to get a standardized method for deploying Apache Kafka clusters within the BDE framework. I am aware of several teams that are manually configuring Kafka within an environment today, each with their own insights on how that should be accomplished and few of them are sharing their methods with one another. Frankly, I feel the lack of collaboration between teams is the biggest challenge for any large-scale organization to overcome. Very rarely is a problem too difficult to solve with technology, it is generally difficult to solve because of a lack of knowledge-sharing between teams and/or organizations.
As I hope all of my readers have come to expect, the proceeding will include the JSON files necessary to add Apache Kafka into BDE, the Chef recipes|templates|libraries and a link to the GitHub repository for Virtual Elephant where you can download all of the pieces to add within your own deployments of BDE to further expand your service catalog.
The cluster definition file for an Apache Kafka clusters includes two roles — zookeeper and a kafka node. I created the requisite file /opt/serengeti/www/specs/Ironfan/kafka/spec.json and populated it with the following:
1 { 2 "nodeGroups":[ 3 { 4 "name": "Zookeeper", 5 "roles": [ 6 "zookeeper" 7 ], 8 "groupType": "zookeeper", 9 "instanceNum": "[3,3,3]", 10 "instanceType": "[SMALL]", 11 "cpuNum": "[1,1,64]", 12 "memCapacityMB": "[7500,3748,min]", 13 "storage": { 14 "type": "[SHARED,LOCAL]", 15 "sizeGB": "[2,2,min]" 16 }, 17 "haFlag": "on" 18 }, 19 { 20 "name": "Kafka", 21 "description": "Apache Kafka Node", 22 "roles": [ 23 "kafka_node" 24 ], 25 "groupType": "master", 26 "instanceNum": "[2,1,max]", 27 "instanceType": "[MEDIUM,SMALL,LARGE,EXTRA_LARGE]", 28 "cpuNum": "[1,1,64]", 29 "memCapacityMB": "[7500,3748,max]", 30 "storage": { 31 "type": "[SHARED,LOCAL]", 32 "sizeGB": "[1,1,min]" 33 }, 34 "haFlag": "on" 35 } 36 ] 37 }
The corresponding /opt/serengeti/www/distros/manifest entry:
93 { 94 "name": "kafka", 95 "vendor": "APACHE", 96 "version": "0.8.2", 97 "packages": [ 98 { 99 "roles": [ 100 "zookeeper", 101 "kafka_node" 102 ], 103 "package_repos": [ 104 "https://hadoop-mgmt.localdomain/yum/bigtop.repo" 105 ] 106 } 107 ] 108 },
The /opt/serengeti/www/specs/map entry:
30 { 31 "vendor" : "Apache", 32 "version" : "^(\\d)+(\\.\\w+)*", 33 "type" : "Apache Kafka Cluster", 34 "appManager" : "Default", 35 "path" : "Ironfan/kafka/spec.json" 36 },
For any new readers, those are the three files/entries required for any new application offering to appear within the BDE GUI in the vSphere Web Client. At this point, the Chef recipes are all that are standing between you and a new service catalog entry. I created the directory structure on the BDE management server in /opt/serengeti/chef/cookbooks/kafka and began by creating the new Chef role necessary for the Kafka node listed in the JSON file.
/opt/serengeti/chef/roles/kafka_node:
1 name 'kafka_node' 2 description 'A role for running Apache Kafka' 3 4 run_list *%w[ 5 kafka 6 ]
The primary Chef recipe file (/opt/serengeti/chef/cookbooks/kafka/recipes/default.rb) has the following code in it:
Note: It is a beast and probably needs to be broken up into multiple recipes at a future date.
1 # 2 # Cookbook Name:: kafka 3 # Recipe:: default 4 # 5 # Licensed under the Apache License, Version 2.0 (the "License"); 6 # you may not use this file except in compliance with the License. 7 # You may obtain a copy of the License at 8 # 9 # http://www.apache.org/licenses/LICENSE-2.0 10 # 11 # Unless required by applicable law or agreed to in writing, software 12 # distributed under the License is distributed on an "AS IS" BASIS, 13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 # See the License for the specific language governing permissions and 15 # limitations under the License. 16 17 include_recipe "java::sun" 18 include_recipe "hadoop_common::pre_run" 19 include_recipe "hadoop_common::mount_disks" 20 include_recipe "hadoop_cluster::update_attributes" 21 22 node.default[:kafka][:zookeeper_server_list] = zookeepers_ip 23 24 set_bootstrap_action(ACTION_INSTALL_PACKAGE, 'kafka', true) 25 26 remote_file "/tmp/kafka-0.8.2.0-src.tgz" do 27 source "http://hadoop-mgmt.localdomain/yum/kafka/kafka-0.8.2.0-src.tgz" 28 action :create 29 end 30 31 remote_file "/tmp/scala-2.11.5.rpm" do 32 source "http://hadoop-mgmt.localdomain/yum/kafka/scala-2.11.5.rpm" 33 action :create 34 end 35 36 remote_file "/etc/yum.repos.d/sbt.repo" do 37 source "http://hadoop-mgmt.localdomain/yum/kafka/sbt.repo" 38 action :create 39 end 40 41 package "scala" do 42 source "/tmp/scala-2.11.5.rpm" 43 action :install 44 provider Chef::Provider::Package::Rpm 45 end 46 47 # Dependency packages 48 %w{gcc gcc-c++ libtool make unzip automake sbt}.each do |pkg| 49 package pkg do 50 action :install 51 end 52 end 53 54 # Create User and Group 55 group "kafka" 56 user "kafka" do 57 comment "Kafka User" 58 gid "kafka" 59 shell "/bin/bash" 60 home "/home/kafka" 61 supports :manage_home => true 62 end 63 64 # A package called Gradle is required for Kafka 65 remote_file "/tmp/gradle-2.3-all.zip" do 66 source "https://services.gradle.org/distributions/gradle-2.3-all.zip" 67 action :create 68 end 69 70 script "install_gradle" do 71 interpreter "bash" 72 user "root" 73 cwd "/tmp" 74 code <<-EOH 75 unzip gradle-2.3-all.zip -d /opt/ 76 ln -s /opt/gradle-2.3 /opt/gradle 77 printf "export GRADLE_HOME=/opt/gradle\nexport PATH=\$PATH:\$GRADLE_HOME/bin\nexport SCALA_VERSION=2.11.5\nexport SCALA_BINARY_VERS ION=2.11\n" > /etc/profile.d/gradle.sh 78 . /etc/profile.d/gradle.sh 79 EOH 80 end 81 82 # Install the Kafka package from source 83 script "install_kafka" do 84 interpreter "bash" 85 user "root" 86 cwd "/tmp" 87 code <<-EOH 88 tar zxf kafka-0.8.2.0-src.tgz 89 mv /tmp/kafka-0.8.2.0-src /opt/kafka-0.8.2.0 90 ln -s /opt/kafka-0.8.2.0 /opt/kafka 91 cd /opt/kafka 92 /opt/gradle/bin/gradle -PscalaVersion=2.11.5 -PscalaBinaryVersion=2.11 93 ./gradlew -PscalaVersion=2.11.5 -PscalaBinaryVersion=2.11 jar 94 chown -R kafka:kafka /opt/kafka-0.8.2.0 95 EOH 96 end 97 98 # Setup the variables for the server.properties file 99 # Need a unique broker_id for each node 100 node.default[:kafka][:broker_id] = rand(1..65535) 101 102 template '/opt/kafka/config/server.properties' do 103 source 'server.properties.erb' 104 variables( 105 zookeeper_server_list: node.default[:kafka][:zookeeper_server_list], 106 broker: node.default[:kafka][:broker_id] 107 ) 108 action :create 109 end 110 111 # Create init.d script for kafka 112 template "/etc/init.d/kafka" do 113 source "kafka.initd.erb" 114 owner "root" 115 group "root" 116 mode 00755 117 end 118 119 execute "Starting Kafka Service" do 120 command "service kafka start" 121 end 122 123 clear_bootstrap_action
There was also a need for a Chef library file — /opt/serengeti/chef/cookbooks/kafka/libraries/default.rb — which has the following code in it:
1 module Kafka 2 def is_kafka 3 node.role?("kakfa_node") 4 end 5 6 def kafka_nodes_ip 7 servers = all_providers_fqdn_for_role("kafka_node") 8 Chef::Log.info("Apache Kafka nodes in cluster #{node[:cluster_name]} are: #{servers.inspect}") 9 servers 10 end 11 end 12 13 class Chef::Recipe; include Kafka; end
In addition to the Chef recipes themselves, there were a couple template files created to support the recipes and handle the configuration of the cluster. Those have been added into the GitHub repository and will be included when you download the files.
Do not forget to run the knife command to update the files into the Chef server and restart the Tomcat service on the BDE management server.
Download the files here from GitHub.
The framework now has all of the pieces required to begin deploying Apache Kafka clusters within your VMware private cloud environment in a uniform and concise manner. An upcoming post will be about performance testing Apache Kafka clusters within your environment to better understand how a high-throughput distributed messaging system handles within a virtualized environment.
Enjoy!
Note: Originally I had planned to have Kafka be started via the supervisord service, like I did with Apache Storm, however the method that Kafka uses to run did not allow supervisord to monitor the pid. If I can figure out how to do that in the future, I’ll modify the code.