VMworld 2014 US session schedule

With VMworld 2014 in the United States fast approaching, I have been working on building out my schedule based on my personal objectives and checking the popular blogger sites for their recommendations. In that spirit, I thought I would share the sessions I am most excited about this year in San Francisco.

Last year was my first year at VMworld and I focused on the Hands-on-Labs (HoLs) and generic sessions to better understand the VMware ecosystem. This year I am focused on three primary topics:

  • VMware NSX
  • Openstack|Docker|Containers with VMware
  • VMware VSAN

Here are the sessions I am focused on:

  • SEC1746 NSX Distributed Firewall Deep Dive
  • NET1966 Operational Best Practices for VMware NSX
  • NET1949 VMware NSX for Docker, Containers & Mesos
  • SDDC3350 VMware and Docker — Better Together
  • SDDC2370 Why Openstack runs best with the vCloud suite
  • STO1279 Virtual SAN Architecture Deep Dive
  • STO1424 Massively Scaling Virtual SAN implementations

In addition to that, I am also excited for my own sessions at VMworld this year around Hadoop , VMware BDE and building a Hadoop-as-a-Service!

  • VAPP1428 Hadoop-as-a-Service: Utilizing VMware Cloud Automation Center and Big Data Extensions at Adobe (Monday & Wednesday sessions)

Excited for the week to get kicked off and see all the exciting things coming to our virtualized world.

Linux VM boot error workaround

Not specifically related to Hadoop or Big Data Extensions, but I came across this bug tonight. There is a KB article on the VMware website (here), but the syntax it lists is incorrect.

The error I was seeing on the VM console was “vmsvc [warning] [guestinfo] RecordRoutingInfo: Unable to collect IPv4 routing table” immediately after it brought eth0 online. The workaround to fix the issue, beyond upgrading arping in the OS, is to add the following line in the virtual machine .vmx file:

rtc.diffFromUTC = “0”

The quotes are missing from the VMware knowledge base article and are indeed necessary to fix the issue and get the virtual machine past this point in the boot process.

Adding Hadoop Jobtracker History retention in BDE

As we’ve been working on very large datasets tied back to an Isilon array for the HDFS layer, we discovered that the history server functionality was missing from BDE (both 1.1 and 2.0). After talking to a few individuals and getting some direction, but no solution, I realized the ability to turn the feature was available — just not done.

In order to turn on the jobtracker history server functionality, so that you can see the job logs after they complete, add the following code to the file:

  • BDE 1.1 /opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/mapred-site.xml.erb
  • BDE 2.0
  • /opt/serengeti/chef/cookbooks/hadoop_cluster/templates/default/mapred-site.xml.erb

27 <property>
28  <name>mapreduce.jobhistory.webapp.address</name>
29   <value><%= @resourcemanager_address %>:19888</value>
30 </property>
31
32 <property>
33   <name>mapreduce.jobhistory.address</name>
34   <value><%= @resourcemanager_address %>:10020</value>
35 </property>
As always, be sure to run ‘knife cookbook upload -a’ after editing the file and then it will be available for you to use during your cluster deployments.

VMworld 2014 Session information

The schedule has been announced for VMworld 2014 and I will be speaking with Andrew Nelson at two different times.

VAPP1428 – Hadoop as a Service: Utilizing VMware Cloud Automation Center and Big Data Extensions at Adobe

Looking forward to discussing Hadoop-as-a-Service in great detail. Hope to see you all there!

UPDATE: I’ve been informed that our session has also been picked up for VMworld EMEA in Barcelona, Spain this October!

Setting up Big Data Extensions Orchestrator workflows for Hadoop

The default VMware Orchestrator plugin for Big Data Extensions is setup for deploying only Apache Hadoop clusters. That may be enough for your organization, but if you have already setup additional Hadoop distributions you ought to have them available to your vCloud Automation Center catalog. In order to do so, there are a couple of options available to you.

  1. Edit existing workflows to take a variable where you specify the Hadoop distribution.
  2. Duplicate the workflows and edit them to work only with a specific Hadoop distribution.

I chose to go with option #2 within my Hadoop Platform-as-a-Service offerings. Continue reading “Setting up Big Data Extensions Orchestrator workflows for Hadoop”