Working on a specific use-case at work has required that I modify the Chef recipe templates for mapred-site.xml and yarn-site.xml to configure the memory allocations correctly. The container sizes themselves will depend on the size of VMs you are creating, and BDE has some generic settings by default, but again with each workload being different it is necessary to tune these parameters just as you would with a physical Hadoop cluster.
The virtual machines within this compute-only (Isilon-backed HDFS + NameNode) cluster utilized the ‘Medium’ sized node within BDE. That means:
- 2 vCPU
- 7.5GB RAM
- 100GB drives
The specific YARN and MapReduce settings I have used to take advantage of the total memory allocated to the cluster was:
/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/mapred-site.xml.erb
155 <% else %> 156 <property> 157 <name>mapred.child.java.opts</name> 158 <value>-Xmx1024m</value> 159 </property> 160 161 <!-- <property> --> 162 <!-- <name>mapred.child.ulimit</name> --> 163 <!-- <value><%= node[:hadoop][:java_child_ulimit] %></value> --> 164 <!-- </property> --> 165 166 <property> 167 <description>MapReduce map memory, in MB</description> 168 <name>mapreduce.map.memory.mb</name> 169 <value>1024</value> 170 </property> 171 172 <property> 173 <description>MapReduce map java options</description> 174 <name>mapreduce.map.java.opts</name> 175 <value>-Xmx819m</value> 176 </property> 177 178 <property> 179 <description>MapReduce reduce memory, in MB</description> 180 <name>mapreduce.reduce.memory.mb</name> 181 <value>2048</value> 182 </property> 183 184 <property> 185 <description>MapReduce reduce java options</description> 186 <name>mapreduce.reduce.java.opts</name> 187 <value>-Xmx1638m</value> 188 </property> 189 190 <property> 191 <description>MapReduce task IO sort, in MB</description> 192 <name>mapreduce.task.io.sort.mb</name> 193 <value>409</value> 194 </property> 195 196 <% end %>
/opt/serengeti/cookbooks/cookbooks/hadoop_cluster/templates/default/yarn-site.xml.erb
72 <property> 73 <description>Amount of physical memory, in MB, that can be allocated 74 for containers.</description> 75 <name>yarn.nodemanager.resource.memory-mb</name> 76 <!-- <value><%= node[:yarn][:nm_resource_mem] %></value> --> 77 <value>6122</value> 78 </property> 79 80 <property> 81 <description>The amount of memory the MR AppMaster needs.</description> 82 <name>yarn.app.mapreduce.am.resource.mb</name> 83 <!-- <value><%= node[:yarn][:am_resource_mem] %></value> --> 84 <value>2048</value> 85 </property> 86 87 <property> 88 <description>Scheduler minimum memory, in MB, that can be allocated.</description> 89 <name>yarn.scheduler.minimum-allocation-mb</name> 90 <value>1024</value> 91 </property> 92 93 <property> 94 <description>Scheduler maximum memory, in MB, that can be allocated.</description> 95 <name>yarn.scheduler.maximum-allocation-mb</name> 96 <value>6122</value> 97 </property> 98 99 <property> 100 <description>Application master options</description> 101 <name>yarn.app.mapreduce.am.command-opts</name> 102 <value>-Xmx1638m</value> 103 </property> ... 126 <property> 127 <description>Disable the vmem check that is turned on by default in Yarn.</description> 128 <name>yarn.nodemanager.vmem-check.enabled</name> 129 <value>false</value> 130 </property>
Again, mileage will vary depending on your Hadoop workload, but these configuration settings should allow you to utilize the majority of the memory resources within a cluster deployed with the ‘Medium’ sized nodes within BDE.
I used the following articles as guidelines when tuning my cluster, along with trial and error.