Working on a specific use-case at work has required that I modify the Chef recipe templates for mapred-site.xml and yarn-site.xml to configure the memory allocations correctly. The container sizes themselves will depend on the size of VMs you are creating, and BDE has some generic settings by default, but again with each workload being different it is necessary to tune these parameters just as you would with a physical Hadoop cluster.

The virtual machines within this compute-only (Isilon-backed HDFS + NameNode) cluster utilized the ‘Medium’ sized node within BDE. That means:

  • 2 vCPU
  • 7.5GB RAM
  • 100GB drives

The specific YARN and MapReduce settings I have used to take advantage of the total memory allocated to the cluster was:

/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/mapred-site.xml.erb
155 <% else %>
156 <property>
157   <name>mapred.child.java.opts</name>
158   <value>-Xmx1024m</value>
159 </property>
160
161 <!-- <property> -->
162 <!--  <name>mapred.child.ulimit</name> -->
163 <!--  <value><%= node[:hadoop][:java_child_ulimit] %></value> -->
164 <!-- </property> -->
165
166 <property>
167   <description>MapReduce map memory, in MB</description>
168   <name>mapreduce.map.memory.mb</name>
169   <value>1024</value>
170 </property>
171
172 <property>
173   <description>MapReduce map java options</description>
174   <name>mapreduce.map.java.opts</name>
175   <value>-Xmx819m</value>
176 </property>
177
178 <property>
179   <description>MapReduce reduce memory, in MB</description>
180   <name>mapreduce.reduce.memory.mb</name>
181   <value>2048</value>
182 </property>
183
184 <property>
185   <description>MapReduce reduce java options</description>
186   <name>mapreduce.reduce.java.opts</name>
187   <value>-Xmx1638m</value>
188 </property>
189
190 <property>
191   <description>MapReduce task IO sort, in MB</description>
192   <name>mapreduce.task.io.sort.mb</name>
193   <value>409</value>
194 </property>
195
196 <% end %>

/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/yarn-site.xml.erb
 72 <property>
 73   <description>Amount of physical memory, in MB, that can be allocated
 74     for containers.</description>
 75   <name>yarn.nodemanager.resource.memory-mb</name>
 76   <!-- <value><%= node[:yarn][:nm_resource_mem] %></value> -->
 77   <value>6122</value>
 78 </property>
 79
 80 <property>
 81   <description>The amount of memory the MR AppMaster needs.</description>
 82   <name>yarn.app.mapreduce.am.resource.mb</name>
 83   <!-- <value><%= node[:yarn][:am_resource_mem] %></value> -->
 84   <value>2048</value>
 85 </property>
 86
 87 <property>
 88   <description>Scheduler minimum memory, in MB, that can be allocated.</description>
 89   <name>yarn.scheduler.minimum-allocation-mb</name>
 90   <value>1024</value>
 91 </property>
 92
 93 <property>
 94   <description>Scheduler maximum memory, in MB, that can be allocated.</description>
 95   <name>yarn.scheduler.maximum-allocation-mb</name>
 96   <value>6122</value>
 97 </property>
 98
 99 <property>
100   <description>Application master options</description>
101   <name>yarn.app.mapreduce.am.command-opts</name>
102   <value>-Xmx1638m</value>
103 </property>
...
126 <property>
127   <description>Disable the vmem check that is turned on by default in Yarn.</description>
128   <name>yarn.nodemanager.vmem-check.enabled</name>
129   <value>false</value>
130 </property>
Again, mileage will vary depending on your Hadoop workload, but these configuration settings should allow you to utilize the majority of the memory resources within a cluster deployed with the ‘Medium’ sized nodes within BDE.
I used the following articles as guidelines when tuning my cluster, along with trial and error.