I have been wanting to spend some time with the Application Master feature built into Big Data Extensions and use Cloudera Manager to manage the SDLC of a Hadoop cluster in my lab environment for a while now. I was able to find time over the weekend to work on installing Cloudera Manager onto a virtual machine and tie it into the Big Data Extensions deployed in my vSphere lab environment. The genius behind the Cloudera Manager is the ability it gives a consumer — administrator, engineer or data scientist — to have a single-pane of glass for managing a Hadoop cluster. The Cloudera website states
Cloudera Manager makes it easy to manage Hadoop deployments of any scale in production. Quickly deploy, configure, and monitor your cluster through an intuitive UI – complete with rolling upgrades, backup and disaster recovery, and customizable alerting.
I remember trying to get Cloudera Manager working with a previously deployed Hadoop cluster back when BDE was on v1.0 — it was not successful. To have the added functionality built into BDE now, further expands its capabilities to all types of Hadoop environments — whether they are one-off clusters or offering Hadoop-as-a-Service.
The post will go through the steps required to install Cloudera Manager for Hadoop deployments through VMware Big Data Extensions.
Installing Cloudera Manager
Note: CentOS 6 is my preferred Linux distribution and I run it in my lab environment for almost all of my Linux management VM roles. The instructions to follow are specific to running Cloudera Manager on a CentOS 6 VM. Your mileage may vary.
The installer file for Cloudera Manager needs to be downloaded from Cloudera before it can be installed on a virtual machine in the environment. The Cloudera website has a link for downloading the bin file, as seen in the following screenshot.
Once the installer has been downloaded onto the VM designated for running Cloudera Manager, a few steps are required to complete the installation.
Disable SELinux
SELinux will need to be disabled in order for Cloudera Manager to successfully install. Edit the /etc/sysconfig/selinux file and change line 7 to state disabled.
Disable IPTABLES
Initially, I added an IPTABLES ruleset to allow incoming traffic on port 80, 443 and 7180 for Cloudera Manager. Although that allowed the UI to run correctly, early testing of Hadoop deployments failed and upon stopping the service altogether, the agents were able to be installed on the Hadoop nodes.
[[email protected] ~]# service iptables stop [[email protected] ~]# chkconfig iptables off
The final step is to run the installer (cloudera-manager-installer.bin) from the command line.
[[email protected] ~]# chmod u+x cloudera-manager-installer.bin [[email protected] ~]# ./cloudera-manager-installer.bin
Accept the EULAs and the process is off to the races. Upon completion, the following screen should appear in the terminal.
You can see from the screenshot, the web UI and username/password information for the newly installed instance of Cloudera Manager.
Adding an Application Manager to BDE
Once the Cloudera Manager is installed, the next step is to tie it into the Big Data Extensions installation in the vSphere environment. To do so, log onto the vSphere Web UI and go to the Big Data Extensions tab. Under the Application Masters selection on the left-side menu, click the plus icon and fill out the form.
Now the Big Data Extensions framework is capable of using the Cloudera Manager for installing and managing Hadoop clusters.
Deploy a Hadoop Cluster using Cloudera Manager
In the vSphere Web UI, deploy a Hadoop cluster using Big Data Extensions — the only difference now is selecting the CDH5 Manager as the Application Manager.
The deployment process will initially proceed in the same way it would without using Cloudera Manager. The Big Data Extensions framework will clone the template VM, configure it based on the memory, disk and CPU specified and power on all of the VMs. Once the VMs have their initial configuration, BDE hands them off to Cloudera Manager for installing the local agent and then the proper Hadoop applications.
Once the deployment is complete, using the Cloudera Manager, the newly deployed Hadoop cluster is visible.
There were a few other minor tweaks within Cloudera Manager I found necessary to have it working ‘just so’ in my vSphere environment. I will be posting what those tweaks were and going over other parts of Cloudera Manager that will assist in the SDLC management of Hadoop clusters in other posts this week.
Enjoy!