With the growing popularity of the “datacenter as an OS” concept and the newer application-level, or container-level, resource managers (Mesos, YARN, Pivotal CF), what role does VMware vSphere Distributed Resource Scheduler (DRS) play? As I spend each day working in a multi-tenant cloud environment, being able to answer this question effectively is becoming more important each day. Engineers want to take advantage of the new application methodology that a Mesos resource manager offers them as they look to re-architecting legacy applications.
First off, when we talk about these application clusters, there are two terms that can be mixed together and make discussing the topic confusing with others. For that reason, I prefer to rely on these two definitions to help clarify the conversation:
- Cluster – A set of compute resources presented externally as an IaaS layer.
- Cell – An application-level or container-level cluster that consumes resources from the IaaS cluster.
In the most basic sense, DRS exists at the cluster level and Mesos exists at the cell level. The next piece to address is the perception of how a cell operates and the reality of how a cell operates.
Engineers generally believe that a cell provides them with a dedicated set of compute resources where they can load their containerized applications onto to be taken care of by the cell’s resource manager. While that view is true, it is limited in scope. In most cases, the resources the cell provides are being shared with other tenants within the same IaaS layer — whether you are using a public cloud (AWS, vCloud Air, etc) or a private cloud environment. The reality is you have a dedicated set of resources that can become limited by the other reservations the cloud provider has made to other end-users.
Therefore, the cell is making decisions based on a limited view and is unaware of the other cells around it. It is here where VMware DRS comes into play and begins to offer an additional level of resource management. DRS is able to provide a complete view (compute, storage and network) into all that is going on within the cluster to ensure the workloads being generated by the cells are distributed across all of the resources adequately. It also enables an IaaS offering to make intelligent decisions around where the initial placement of new nodes, as cells are created or expanded, to provide a higher-level of service.
There continue to be ongoing discussions around how cells can/should be presented up from a VMware IaaS layer — whether it looks like a traditional offering or more along the lines of how VMware Integrated OpenStack (VIO) approaches the problem. The VIO approach is to offer up an entire cluster as one giant compute resource to OpenStack, whereas Big Data Extensions still looks at individual ESXi hosts within a DRS cluster.
It will be interesting to see where things shake out, and I am not sure one is truly better than the other. The most important thing to remember is utilizing two layers of resource schedulers should allow a multi-tenant environment to operate more smoothly and efficiently. VMware has invested a great deal in DRS and to believe that it is going to be replaced entirely by the newer, less-mature application resource schedulers is naive at best.
Note: VMware Big Data Extensions currently makes initial placement decisions for storage and compute without using VMware DRS, instead relying on definitions within the JSON file for the cluster. DRS can still be leveraged after-the-fact by clusters deployed using BDE as the framework.