Setting up Big Data Extensions Orchestrator workflows for Hadoop

The default VMware Orchestrator plugin for Big Data Extensions is setup for deploying only Apache Hadoop clusters. That may be enough for your organization, but if you have already setup additional Hadoop distributions you ought to have them available to your vCloud Automation Center catalog. In order to do so, there are a couple of options available to you.

Edit existing workflows to take a variable where you specify the Hadoop distribution.
Duplicate the workflows and edit them to work only with a specific Hadoop distribution.

I chose to go with option #2 within my Hadoop Platform-as-a-Service offerings. Within the PaaS offering built within vCAC, there are three distributions available to the end-user; Apache, Cloudera and Pivotal PHD.

I am not going to cover the installation of the plugin, but I will link to the installation documentation that works once you have downloaded the plugin. From there you will need to login into your external vCO appliance and begin modifying the workflows found within the Hadoop Cluster As A Service directory tree.

Within the Cluster Operation Service folder you will see a set of workflows named:

Create Basic Hadoop Cluster
Create Compute Only Cluster
Create Data Compute Separated Cluster

Depending on the business or technical decisions you have made, you will need to duplicate any or all of those workflows and rename them for the distributions you’ve chosen to support. I created the following for the Cloudera distribution:

Create Cloudera Basic Hadoop Cluster
Create Cloudera Compute Only Cluster
Create Cloudera Data Compute Separated Cluster

The next step is to duplicate the execute workflows and edit the bits to specify the alternate Hadoop distribution. These workflows are held within the Configuation/Execute Operations directory.

Execute Create Cluster Operation
Execute Create Compute Only Cluster Operation
Execute Create Data Compute Separated Cluster Operation

Create the following new workflows for the Cloudera distribution:

Execute Create Cloudera Cluster Operation
Execute Create Cloudera Compute Only Cluster Operation
Execute Create Cloudera Data Compute Separated Cluster Operation

Now the critical part — getting the workflows to function properly with one another. Because you’ve merely duplicated the workflows, they all still point to the original pieces within the Schemas. Start by editing the Create Cloudera Basic Cluster workflow you duplicated, go to the Schema tab and perform the following tasks:

Look for the ‘Execute Create Cluster’ icon.
Press CTRL-E to go into Edit mode within vCO.
Hover over the icon and select Edit (pencil icon).
Under the ‘Info’ tab, locate the label and update it to point to the new Execute workflow name.
Click on the link ‘Change workflow’. Find the new ‘Execute Create Cloudera Cluster Operation’ you duplicated and select it as the target.

The remaining step is to specify the actual distribution. Under the Schema tab of the Execute Create Cloudera Cluster Operation, edit the Generate Create Cluster workflow piece. Select the Scripting tab and add the following text to the bottom of the js portion of the script.

"password":password,
"distro":"CDH4",
"distroVendor":"CDH"
};

Make sure the distro and distroVendor match the very same values you set when the additional distributions were added to your BDE management server.

Repeat these steps for each new workflow for each distribution you are adding support for.

All of the pieces should now be in place and you are then able to create additional blueprints within vCAC for use by your end-users.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Setting up Big Data Extensions Orchestrator workflows for Hadoop - Virtual Elephant

Deploying a HDFS cluster.

VMworld 2014 Session information

Pages

Recent Posts

Categories

Lastest Posts

Calendar

Contact Us

Get Update

Setting up Big Data Extensions Orchestrator workflows for Hadoop - Virtual Elephant

Tags:

Share: