Category: Virtualization

With VMworld 2014 in the United States fast approaching, I have been working on building out my schedule based on my personal objectives and checking the popular blogger sites for their recommendations. In that spirit, I thought I would share the sessions I am most excited about this year in San Francisco.

Last year was my first year at VMworld and I focused on the Hands-on-Labs (HoLs) and generic sessions to better understand the VMware ecosystem. This year I am focused on three primary topics:

  • VMware NSX
  • Openstack|Docker|Containers with VMware
  • VMware VSAN

Here are the sessions I am focused on:

  • SEC1746 NSX Distributed Firewall Deep Dive
  • NET1966 Operational Best Practices for VMware NSX
  • NET1949 VMware NSX for Docker, Containers & Mesos
  • SDDC3350 VMware and Docker — Better Together
  • SDDC2370 Why Openstack runs best with the vCloud suite
  • STO1279 Virtual SAN Architecture Deep Dive
  • STO1424 Massively Scaling Virtual SAN implementations

In addition to that, I am also excited for my own sessions at VMworld this year around Hadoop , VMware BDE and building a Hadoop-as-a-Service!

  • VAPP1428 Hadoop-as-a-Service: Utilizing VMware Cloud Automation Center and Big Data Extensions at Adobe (Monday & Wednesday sessions)

Excited for the week to get kicked off and see all the exciting things coming to our virtualized world.

Read More

Not specifically related to Hadoop or Big Data Extensions, but I came across this bug tonight. There is a KB article on the VMware website (here), but the syntax it lists is incorrect.

The error I was seeing on the VM console was “vmsvc [warning] [guestinfo] RecordRoutingInfo: Unable to collect IPv4 routing table” immediately after it brought eth0 online. The workaround to fix the issue, beyond upgrading arping in the OS, is to add the following line in the virtual machine .vmx file:

rtc.diffFromUTC = “0”

The quotes are missing from the VMware knowledge base article and are indeed necessary to fix the issue and get the virtual machine past this point in the boot process.

Read More

Working on a specific use-case at work has required that I modify the Chef recipe templates for mapred-site.xml and yarn-site.xml to configure the memory allocations correctly. The container sizes themselves will depend on the size of VMs you are creating, and BDE has some generic settings by default, but again with each workload being different it is necessary to tune these parameters just as you would with a physical Hadoop cluster.

The virtual machines within this compute-only (Isilon-backed HDFS + NameNode) cluster utilized the ‘Medium’ sized node within BDE. That means:

  • 2 vCPU
  • 7.5GB RAM
  • 100GB drives

The specific YARN and MapReduce settings I have used to take advantage of the total memory allocated to the cluster was:

/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/mapred-site.xml.erb
155 <% else %>
156 <property>
157   <name>mapred.child.java.opts</name>
158   <value>-Xmx1024m</value>
159 </property>
160
161 <!-- <property> -->
162 <!--  <name>mapred.child.ulimit</name> -->
163 <!--  <value><%= node[:hadoop][:java_child_ulimit] %></value> -->
164 <!-- </property> -->
165
166 <property>
167   <description>MapReduce map memory, in MB</description>
168   <name>mapreduce.map.memory.mb</name>
169   <value>1024</value>
170 </property>
171
172 <property>
173   <description>MapReduce map java options</description>
174   <name>mapreduce.map.java.opts</name>
175   <value>-Xmx819m</value>
176 </property>
177
178 <property>
179   <description>MapReduce reduce memory, in MB</description>
180   <name>mapreduce.reduce.memory.mb</name>
181   <value>2048</value>
182 </property>
183
184 <property>
185   <description>MapReduce reduce java options</description>
186   <name>mapreduce.reduce.java.opts</name>
187   <value>-Xmx1638m</value>
188 </property>
189
190 <property>
191   <description>MapReduce task IO sort, in MB</description>
192   <name>mapreduce.task.io.sort.mb</name>
193   <value>409</value>
194 </property>
195
196 <% end %>

/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/yarn-site.xml.erb
 72 <property>
 73   <description>Amount of physical memory, in MB, that can be allocated
 74     for containers.</description>
 75   <name>yarn.nodemanager.resource.memory-mb</name>
 76   <!-- <value><%= node[:yarn][:nm_resource_mem] %></value> -->
 77   <value>6122</value>
 78 </property>
 79
 80 <property>
 81   <description>The amount of memory the MR AppMaster needs.</description>
 82   <name>yarn.app.mapreduce.am.resource.mb</name>
 83   <!-- <value><%= node[:yarn][:am_resource_mem] %></value> -->
 84   <value>2048</value>
 85 </property>
 86
 87 <property>
 88   <description>Scheduler minimum memory, in MB, that can be allocated.</description>
 89   <name>yarn.scheduler.minimum-allocation-mb</name>
 90   <value>1024</value>
 91 </property>
 92
 93 <property>
 94   <description>Scheduler maximum memory, in MB, that can be allocated.</description>
 95   <name>yarn.scheduler.maximum-allocation-mb</name>
 96   <value>6122</value>
 97 </property>
 98
 99 <property>
100   <description>Application master options</description>
101   <name>yarn.app.mapreduce.am.command-opts</name>
102   <value>-Xmx1638m</value>
103 </property>
...
126 <property>
127   <description>Disable the vmem check that is turned on by default in Yarn.</description>
128   <name>yarn.nodemanager.vmem-check.enabled</name>
129   <value>false</value>
130 </property>
Again, mileage will vary depending on your Hadoop workload, but these configuration settings should allow you to utilize the majority of the memory resources within a cluster deployed with the ‘Medium’ sized nodes within BDE.
I used the following articles as guidelines when tuning my cluster, along with trial and error.

Read More