Virtualized Hadoop + Isilon HDFS Benchmark Testing

During the VMworld EMEA presentation (Tuesday October 14, 2014) , the question around performance was asked again with regards to using Isilon as the data warehouse layer and what positives and negatives are associated with leveraging Isilon as that HDFS layer. As with any benchmark or performance testing, results will vary based on the data set you have, the hardware you are leveraging and how you have the clusters configured. However, there are some things that I’ve learned over the last year and a half that are applicable on a broad scale that can show the advantages to leveraging Isilon as the HDFS layer, especially when you have very large data sets (10+ Petabytes).

There are two benchmarking tests I want to focus on for this post. The tests themselves demonstrate the necessity for understanding the workload (Hadoop job), the size of the data set, and the individual configuration settings (YARN, MapReduce, and Java) for the compute worker nodes.

Continue reading “Virtualized Hadoop + Isilon HDFS Benchmark Testing”

Workload-based cluster sizing for Hadoop

There is a quote in the book “Hadoop Operations by Eric Sammer  (O’Reilly)” where it states:

“The complexity of sizing a cluster comes from knowing — or more commonly, not knowing — the specifics of such a workload: its CPU, memory, storage, disk I/O, or frequency of execution requirements. Worse, it’s common to see a single cluster support many diverse types of jobs with conflicting resource requirements.”

In my experience that is a factual statement. It does not however, preclude one from determining that very information so that an intelligent decision can be made. In fact, VMware vCenter Operations Manager becomes an invaluable tool in the toolbox when developing the ability to maintain the entire SDLC of a Hadoop cluster.

Initial sizing of the Hadoop cluster in the Engineering|Pre-Production|Chaos environment of your business will include some amount of guessing. You can stick with the tried and true methodology of answering the following two questions — “How much data do I have for HDFS initially?” and “How much data do I need to ingest into HDFS daily|monthly?” It is at this point that you’ll need to start monitoring the workload(s) placed on the Hadoop cluster and begin making determinations for the cluster size once it moves into the QE, Staging and Production environments.

Continue reading “Workload-based cluster sizing for Hadoop”

What is the virtualization penalty with Hadoop?

After a long week off, I am back and should be posting 2-3x per week leading up to VMworld 2014 in August.

I keep getting this question from various software engineers, system engineers and managers so I thought it would be a good topic to address here.

Disclaimer: Mileage will vary depending on your compute hardware, disk systems (DAS or NAS) and individual Hadoop workloads.

Now that the disclaimer is out of the way, let me spend some time answering the general form of the question. First, there are several whitepapers that show what virtualizing Hadoop looks like with various workloads, Hadoop distributions and hardware.

Generally speaking, running Hadoop within a virtual machine incurs a less than 5% performance penalty. However, that is based on no modification to the configuration of the Hadoop cluster and that means you likely don’t fully understand the workload you are hosting. If you saw my earlier post on YARN containers, you will hopefully come to the same conclusion that I have and that is customization is key for the infrastructure. Hadoop is very much not a one-size-fit-all system.

Reading the whitepaper from Dell, Intel and VMware it shows that the overall result of the tests they ran, several of the virtualized clusters outperformed the performance from a physical cluster utilizing the same hardware. There is the one test, DFSIOE-READ, which ran significantly worse when virtualized.

In the tests I have been running against different sized datasets with the very same Hive job, the performance within the virtual cluster have been within the +/- 5% threshold we aim to accomplish when working with an Engineering team to show them the benefits of virtualizing. The advantage we then have with virtualizing lies with rightsizing both the size and number of VMs within the cluster to then outperform a physical cluster while keeping the total cost of ownership lower when compared to trying to create/deploy/scale a physical cluster.

Bottom line: Is there a penalty to virtualizing Hadoop?

Simple answer: Yes, a 5-10% penalty in general.

Better answer: Yes. However, if you understand your workload and customize the cluster (using the tools available within BDE) the penalty quickly becomes nonexistent and a virtualized Hadoop cluster should outperform a physical cluster running on the very same physical hardware.

Rightsizing YARN containers for virtual machines

Working on a specific use-case at work has required that I modify the Chef recipe templates for mapred-site.xml and yarn-site.xml to configure the memory allocations correctly. The container sizes themselves will depend on the size of VMs you are creating, and BDE has some generic settings by default, but again with each workload being different it is necessary to tune these parameters just as you would with a physical Hadoop cluster.

The virtual machines within this compute-only (Isilon-backed HDFS + NameNode) cluster utilized the ‘Medium’ sized node within BDE. That means:

  • 2 vCPU
  • 7.5GB RAM
  • 100GB drives

The specific YARN and MapReduce settings I have used to take advantage of the total memory allocated to the cluster was:

/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/mapred-site.xml.erb
155 <% else %>
156 <property>
157   <name>mapred.child.java.opts</name>
158   <value>-Xmx1024m</value>
159 </property>
160
161 <!-- <property> -->
162 <!--  <name>mapred.child.ulimit</name> -->
163 <!--  <value><%= node[:hadoop][:java_child_ulimit] %></value> -->
164 <!-- </property> -->
165
166 <property>
167   <description>MapReduce map memory, in MB</description>
168   <name>mapreduce.map.memory.mb</name>
169   <value>1024</value>
170 </property>
171
172 <property>
173   <description>MapReduce map java options</description>
174   <name>mapreduce.map.java.opts</name>
175   <value>-Xmx819m</value>
176 </property>
177
178 <property>
179   <description>MapReduce reduce memory, in MB</description>
180   <name>mapreduce.reduce.memory.mb</name>
181   <value>2048</value>
182 </property>
183
184 <property>
185   <description>MapReduce reduce java options</description>
186   <name>mapreduce.reduce.java.opts</name>
187   <value>-Xmx1638m</value>
188 </property>
189
190 <property>
191   <description>MapReduce task IO sort, in MB</description>
192   <name>mapreduce.task.io.sort.mb</name>
193   <value>409</value>
194 </property>
195
196 <% end %>

/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/yarn-site.xml.erb
 72 <property>
 73   <description>Amount of physical memory, in MB, that can be allocated
 74     for containers.</description>
 75   <name>yarn.nodemanager.resource.memory-mb</name>
 76   <!-- <value><%= node[:yarn][:nm_resource_mem] %></value> -->
 77   <value>6122</value>
 78 </property>
 79
 80 <property>
 81   <description>The amount of memory the MR AppMaster needs.</description>
 82   <name>yarn.app.mapreduce.am.resource.mb</name>
 83   <!-- <value><%= node[:yarn][:am_resource_mem] %></value> -->
 84   <value>2048</value>
 85 </property>
 86
 87 <property>
 88   <description>Scheduler minimum memory, in MB, that can be allocated.</description>
 89   <name>yarn.scheduler.minimum-allocation-mb</name>
 90   <value>1024</value>
 91 </property>
 92
 93 <property>
 94   <description>Scheduler maximum memory, in MB, that can be allocated.</description>
 95   <name>yarn.scheduler.maximum-allocation-mb</name>
 96   <value>6122</value>
 97 </property>
 98
 99 <property>
100   <description>Application master options</description>
101   <name>yarn.app.mapreduce.am.command-opts</name>
102   <value>-Xmx1638m</value>
103 </property>
...
126 <property>
127   <description>Disable the vmem check that is turned on by default in Yarn.</description>
128   <name>yarn.nodemanager.vmem-check.enabled</name>
129   <value>false</value>
130 </property>
Again, mileage will vary depending on your Hadoop workload, but these configuration settings should allow you to utilize the majority of the memory resources within a cluster deployed with the ‘Medium’ sized nodes within BDE.
I used the following articles as guidelines when tuning my cluster, along with trial and error.

Performance Tuning for Hadoop Clusters

As I stated previously, the session I learned the most from at Hadoop Summit was about performance tuning the OS to ensure the cluster is getting the most from the infrastructure (slides can be found here). In order to do so, I had to modify the Chef recipes inside of the BDE management server to have the updates installed on all new clusters.

  • Disable swampiness, increase the proc and file limits in /opt/serengeti/cookbooks/cookbooks/hadoop_cluster/recipes/dedicated_server_tuning.rb
   3 ulimit_hard_nofile = 32768
   4 ulimit_soft_nofile = 32768
   5 ulimit_hard_nproc = 32768
   6 ulimit_soft_nproc = 32768
   7 vm_swappiness = 0
   8 redhat_transparent_hugepage = "never"
   9 vm_swappiness_line = "vm.swappiness = 0"
  10
  11 def set_proc_sys_limit desc, proc_path, limit
  12   bash desc do
  13     not_if{ File.exists?(proc_path) && (File.read(proc_path).chomp.strip == limit.to_s) }
  14     code  "echo #{limit} > #{proc_path}"
  15   end
  16 end
  17
  18 def set_swap_sys_limit desc, file_path, limit
  19   bash desc do
  20     not_if{ File.exists?(file_path) && (File.read(file_path).chomp.strip == limit.to_s) }
  21     code  "echo #{limit} > #{file_path}"
  22   end
  23 end
  24
  25 set_proc_sys_limit "VM overcommit ratio", '/proc/sys/vm/overcommit_memory', overcommit_memory
  26 set_proc_sys_limit "VM overcommit memory", '/proc/sys/vm/overcommit_ratio',  overcommit_ratio
  27 set_proc_sys_limit "VM swappiness", '/proc/sys/vm/swappiness', vm_swappiness
  28 set_proc_sys_limit "Redhat transparent hugepage defag", '/sys/kernel/mm/redhat_transparent_hugepage/defrag', redhat_transparent_hugepage
  29 set_proc_sys_limit "Redhat transparent hugepage enable", '/sys/kernel/mm/redhat_transparent_hugepage/enabled', redhat_transparent_hugepage
  30
  31 set_swap_sys_limit "SYSCTL swappiness setting", '/etc/sysctl.conf', vm_swappiness_line
  • Remove root reserved space from the filesystems in /opt/serengeti/cookbooks/cookbooks/hadoop_common/libraries/default.rb

335 function format_disk_internal()
336 {
337   kernel=`uname -r | cut -d'-' -f1`
338   first=`echo $kernel | cut -d '.' -f1`
339   second=`echo $kernel | cut -d '.' -f2`
340   third=`echo $kernel | cut -d '.' -f3`
341   num=$[ $first*10000 + $second*100 + $third ]
342
343   # we cannot use [[ "$kernel" < "2.6.28" ]] here becase linux kernel
344   # has versions like "2.6.5"
345   if [ $num -lt 20628 ];
346   then
347     mkfs -t ext3 -b 4096 -m 0 $1;
348   else
349     mkfs -t ext4 -b 4096 -m 0 $1;
350   fi;
351 }
Once the recipes were updated, in order for the changes to take effect, be sure to execute the command:
# knife cookbook upload -a
At which point, BDE will now be configured to include several of the commonly missed performance enhancements on a Hadoop cluster. There are several more configuration changes that can be made that I will cover in a future post.