Cloudera_logo_rgb

 

I have been wanting to spend some time with the Application Master feature built into Big Data Extensions and use Cloudera Manager to manage the SDLC of a Hadoop cluster in my lab environment for a while now. I was able to find time over the weekend to work on installing Cloudera Manager onto a virtual machine and tie it into the Big Data Extensions deployed in my vSphere lab environment. The genius behind the Cloudera Manager is the ability it gives a consumer — administrator, engineer or data scientist — to have a single-pane of glass for managing a Hadoop cluster. The Cloudera website states

Cloudera Manager makes it easy to manage Hadoop deployments of any scale in production. Quickly deploy, configure, and monitor your cluster through an intuitive UI – complete with rolling upgrades, backup and disaster recovery, and customizable alerting.

I remember trying to get Cloudera Manager working with a previously deployed Hadoop cluster back when BDE was on v1.0 — it was not successful. To have the added functionality built into BDE now, further expands its capabilities to all types of Hadoop environments — whether they are one-off clusters or offering Hadoop-as-a-Service.

The post will go through the steps required to install Cloudera Manager for Hadoop deployments through VMware Big Data Extensions.

Installing Cloudera Manager

Note: CentOS 6 is my preferred Linux distribution and I run it in my lab environment for almost all of my Linux management VM roles. The instructions to follow are specific to running Cloudera Manager on a CentOS 6 VM. Your mileage may vary.

The installer file for Cloudera Manager needs to be downloaded from Cloudera before it can be installed on a virtual machine in the environment. The Cloudera website has a link for downloading the bin file, as seen in the following screenshot.

cloudera manager download

Once the installer has been downloaded onto the VM designated for running Cloudera Manager, a few steps are required to complete the installation.

Disable SELinux

SELinux will need to be disabled in order for Cloudera Manager to successfully install. Edit the /etc/sysconfig/selinux file and change line 7 to state disabled.

cloudera manager

Disable IPTABLES

Initially, I added an IPTABLES ruleset to allow incoming traffic on port 80, 443 and 7180 for Cloudera Manager. Although that allowed the UI to run correctly, early testing of Hadoop deployments failed and upon stopping the service altogether, the agents were able to be installed on the Hadoop nodes.

[root@cloudera ~]# service iptables stop
[root@cloudera ~]# chkconfig iptables off

The final step is to run the installer (cloudera-manager-installer.bin) from the command line.

[root@cloudera ~]# chmod u+x cloudera-manager-installer.bin
[root@cloudera ~]# ./cloudera-manager-installer.bin

Accept the EULAs and the process is off to the races. Upon completion, the following screen should appear in the terminal.
cloudera manager
You can see from the screenshot, the web UI and username/password information for the newly installed instance of Cloudera Manager.

Adding an Application Manager to BDE

Once the Cloudera Manager is installed, the next step is to tie it into the Big Data Extensions installation in the vSphere environment. To do so, log onto the vSphere Web UI and go to the Big Data Extensions tab. Under the Application Masters selection on the left-side menu, click the plus icon and fill out the form.

cloudera manager

Now the Big Data Extensions framework is capable of using the Cloudera Manager for installing and managing Hadoop clusters.

Deploy a Hadoop Cluster using Cloudera Manager

In the vSphere Web UI, deploy a Hadoop cluster using Big Data Extensions — the only difference now is selecting the CDH5 Manager as the Application Manager.

cloudera manager

The deployment process will initially proceed in the same way it would without using Cloudera Manager. The Big Data Extensions framework will clone the template VM, configure it based on the memory, disk and CPU specified and power on all of the VMs. Once the VMs have their initial configuration, BDE hands them off to Cloudera Manager for installing the local agent and then the proper Hadoop applications.

Once the deployment is complete, using the Cloudera Manager, the newly deployed Hadoop cluster is visible.

cloudera manager

There were a few other minor tweaks within Cloudera Manager I found necessary to have it working ‘just so’ in my vSphere environment. I will be posting what those tweaks were and going over other parts of Cloudera Manager that will assist in the SDLC management of Hadoop clusters in other posts this week.

Enjoy!