Apache Kafka Installation Guide in vSphere Big Data Extensions


kafka-logo

Let me start off by saying that adding Apache Kafka into the framework of VMware vSphere Big Data Extensions (BDE) has been the most challenging of them all. Not from a framework perspective, but from a Chef cookbook and configuration one. There were a few resources for me to rely on for the overall configuration of Kafka, however many of them had contradicting statements within them. It took a good 8 solid hours of testing and re-testing the recipes before I was able to get a working multi-node Kafka cluster online.

All that being said, it was important for me to get a standardized method for deploying Apache Kafka clusters within the BDE framework. I am aware of several teams that are manually configuring Kafka within an environment today, each with their own insights on how that should be accomplished and few of them are sharing their methods with one another. Frankly, I feel the lack of collaboration between teams is the biggest challenge for any large-scale organization to overcome. Very rarely is a problem too difficult to solve with technology, it is generally difficult to solve because of a lack of knowledge-sharing between teams and/or organizations.

As I hope all of my readers have come to expect, the proceeding will include the JSON files necessary to add Apache Kafka into BDE, the Chef recipes|templates|libraries and a link to the GitHub repository for Virtual Elephant where you can download all of the pieces to add within your own deployments of BDE to further expand your service catalog.

The cluster definition file for an Apache Kafka clusters includes two roles — zookeeper and a kafka node. I created the requisite file /opt/serengeti/www/specs/Ironfan/kafka/spec.json and populated it with the following:

  1 {
  2   "nodeGroups":[
  3     {
  4       "name": "Zookeeper",
  5       "roles": [
  6         "zookeeper"
  7       ],
  8       "groupType": "zookeeper",
  9       "instanceNum": "[3,3,3]",
 10       "instanceType": "[SMALL]",
 11       "cpuNum": "[1,1,64]",
 12       "memCapacityMB": "[7500,3748,min]",
 13       "storage": {
 14         "type": "[SHARED,LOCAL]",
 15         "sizeGB": "[2,2,min]"
 16       },
 17       "haFlag": "on"
 18     },
 19     {
 20       "name": "Kafka",
 21       "description": "Apache Kafka Node",
 22       "roles": [
 23         "kafka_node"
 24       ],
 25       "groupType": "master",
 26       "instanceNum": "[2,1,max]",
 27       "instanceType": "[MEDIUM,SMALL,LARGE,EXTRA_LARGE]",
 28       "cpuNum": "[1,1,64]",
 29       "memCapacityMB": "[7500,3748,max]",
 30       "storage": {
 31         "type": "[SHARED,LOCAL]",
 32         "sizeGB": "[1,1,min]"
 33       },
 34       "haFlag": "on"
 35     }
 36   ]
 37 }

The corresponding /opt/serengeti/www/distros/manifest entry:

 93   {
 94     "name": "kafka",
 95     "vendor": "APACHE",
 96     "version": "0.8.2",
 97     "packages": [
 98       {
 99         "roles": [
100           "zookeeper",
101           "kafka_node"
102         ],
103         "package_repos": [
104           "https://hadoop-mgmt.localdomain/yum/bigtop.repo"
105         ]
106       }
107     ]
108   },

The /opt/serengeti/www/specs/map entry:

 30   {
 31     "vendor" : "Apache",
 32     "version" : "^(\\d)+(\\.\\w+)*",
 33     "type" : "Apache Kafka Cluster",
 34     "appManager" : "Default",
 35     "path" : "Ironfan/kafka/spec.json"
 36   },

For any new readers, those are the three files/entries required for any new application offering to appear within the BDE GUI in the vSphere Web Client. At this point, the Chef recipes are all that are standing between you and a new service catalog entry. I created the directory structure on the BDE management server in /opt/serengeti/chef/cookbooks/kafka and began by creating the new Chef role necessary for the Kafka node listed in the JSON file.

/opt/serengeti/chef/roles/kafka_node:

  1 name        'kafka_node'
  2 description 'A role for running Apache Kafka'
  3 
  4 run_list *%w[
  5   kafka
  6 ]

The primary Chef recipe file (/opt/serengeti/chef/cookbooks/kafka/recipes/default.rb) has the following code in it:

Note: It is a beast and probably needs to be broken up into multiple recipes at a future date.

  1 #
  2 # Cookbook Name:: kafka
  3 # Recipe:: default
  4 #
  5 # Licensed under the Apache License, Version 2.0 (the "License");
  6 # you may not use this file except in compliance with the License.
  7 # You may obtain a copy of the License at
  8 #
  9 #     http://www.apache.org/licenses/LICENSE-2.0
 10 #
 11 # Unless required by applicable law or agreed to in writing, software
 12 # distributed under the License is distributed on an "AS IS" BASIS,
 13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 14 # See the License for the specific language governing permissions and
 15 # limitations under the License.
 16 
 17 include_recipe "java::sun"
 18 include_recipe "hadoop_common::pre_run"
 19 include_recipe "hadoop_common::mount_disks"
 20 include_recipe "hadoop_cluster::update_attributes"
 21 
 22 node.default[:kafka][:zookeeper_server_list] = zookeepers_ip
 23 
 24 set_bootstrap_action(ACTION_INSTALL_PACKAGE, 'kafka', true)
 25 
 26 remote_file "/tmp/kafka-0.8.2.0-src.tgz" do
 27   source "http://hadoop-mgmt.localdomain/yum/kafka/kafka-0.8.2.0-src.tgz"
 28   action :create
 29 end
 30 
 31 remote_file "/tmp/scala-2.11.5.rpm" do
 32   source "http://hadoop-mgmt.localdomain/yum/kafka/scala-2.11.5.rpm"
 33   action :create
 34 end
 35 
 36 remote_file "/etc/yum.repos.d/sbt.repo" do
 37   source "http://hadoop-mgmt.localdomain/yum/kafka/sbt.repo" 38   action :create
 39 end
 40 
 41 package "scala" do
 42   source "/tmp/scala-2.11.5.rpm"
 43   action :install
 44   provider Chef::Provider::Package::Rpm
 45 end
 46 
 47 # Dependency packages
 48 %w{gcc gcc-c++ libtool make unzip automake sbt}.each do |pkg|
 49   package pkg do
 50     action :install
 51   end
 52 end
 53 
 54 # Create User and Group
 55 group "kafka"
 56 user "kafka" do
 57   comment "Kafka User"
 58   gid "kafka"
 59   shell "/bin/bash"
 60   home "/home/kafka"
 61   supports :manage_home => true
 62 end
 63 
 64 # A package called Gradle is required for Kafka
 65 remote_file "/tmp/gradle-2.3-all.zip" do
 66   source "https://services.gradle.org/distributions/gradle-2.3-all.zip"
 67   action :create
 68 end
 69 
 70 script "install_gradle" do
 71   interpreter "bash"
 72   user "root"
 73   cwd "/tmp"
 74   code <<-EOH
 75     unzip gradle-2.3-all.zip -d /opt/
 76     ln -s /opt/gradle-2.3 /opt/gradle
 77     printf "export GRADLE_HOME=/opt/gradle\nexport PATH=\$PATH:\$GRADLE_HOME/bin\nexport SCALA_VERSION=2.11.5\nexport SCALA_BINARY_VERS    ION=2.11\n" > /etc/profile.d/gradle.sh
 78     . /etc/profile.d/gradle.sh
 79   EOH
 80 end
 81 
 82 # Install the Kafka package from source
 83 script "install_kafka" do
 84   interpreter "bash"
 85   user "root"
 86   cwd "/tmp"
 87   code <<-EOH
 88     tar zxf kafka-0.8.2.0-src.tgz
 89     mv /tmp/kafka-0.8.2.0-src /opt/kafka-0.8.2.0
 90     ln -s /opt/kafka-0.8.2.0 /opt/kafka
 91     cd /opt/kafka
 92     /opt/gradle/bin/gradle -PscalaVersion=2.11.5 -PscalaBinaryVersion=2.11
 93     ./gradlew -PscalaVersion=2.11.5 -PscalaBinaryVersion=2.11 jar
 94     chown -R kafka:kafka /opt/kafka-0.8.2.0
 95   EOH
 96 end
 97 
 98 # Setup the variables for the server.properties file
 99 # Need a unique broker_id for each node
100 node.default[:kafka][:broker_id] = rand(1..65535)
101 
102 template '/opt/kafka/config/server.properties' do
103   source 'server.properties.erb'
104   variables(
105     zookeeper_server_list: node.default[:kafka][:zookeeper_server_list],
106     broker: node.default[:kafka][:broker_id]
107   )
108   action :create
109 end
110 
111 # Create init.d script for kafka
112 template "/etc/init.d/kafka" do
113   source "kafka.initd.erb"
114   owner "root"
115   group "root"
116   mode  00755
117 end
118 
119 execute "Starting Kafka Service" do
120   command "service kafka start"
121 end
122 
123 clear_bootstrap_action

There was also a need for a Chef library file — /opt/serengeti/chef/cookbooks/kafka/libraries/default.rb — which has the following code in it:

  1 module Kafka
  2   def is_kafka
  3     node.role?("kakfa_node")
  4   end
  5 
  6   def kafka_nodes_ip
  7     servers = all_providers_fqdn_for_role("kafka_node")
  8     Chef::Log.info("Apache Kafka nodes in cluster #{node[:cluster_name]} are: #{servers.inspect}")
  9     servers
 10   end
 11 end
 12 
 13 class Chef::Recipe; include Kafka; end

In addition to the Chef recipes themselves, there were a couple template files created to support the recipes and handle the configuration of the cluster. Those have been added into the GitHub repository and will be included when you download the files.

Do not forget to run the knife command to update the files into the Chef server and restart the Tomcat service on the BDE management server.

Download the files here from GitHub.

The framework now has all of the pieces required to begin deploying Apache Kafka clusters within your VMware private cloud environment in a uniform and concise manner. An upcoming post will be about performance testing Apache Kafka clusters within your environment to better understand how a high-throughput distributed messaging system handles within a virtualized environment.

Enjoy!

Note: Originally I had planned to have Kafka be started via the supervisord service, like I did with Apache Storm, however the method that Kafka uses to run did not allow supervisord to monitor the pid. If I can figure out how to do that in the future, I’ll modify the code.