Multi-tenant OpenStack with NSX – Part 1


I have been working on an OpenStack architecture design using VMware Integrated OpenStack (VIO) for the past several months. The design itself is being developed for an internal cloud service offer and is the design for my VCDX certification pursuit in 2017. As the design has gone through the Proof-of-Concept and later the Pilot phases, determining how to offer a multi-tenant/personal cloud offering has presented itself to be challenging. The design relies heavily on the NSX software defined networking (SDN) platform, both because of requirements from VIO and service offering requirements. As such, all north-south traffic goes through a set of NSX Edge devices. Prior to hitting the north-south boundary however, the east-west tenant traffic needed to be addressed.

I had seen other designs where a single, large (/22 or greater) external network was relied upon by all OpenStack tenants talked about. However, for a personal cloud or multi-tenant cloud offering based on OpenStack, I felt this was not a good design choice for this environment. I wanted to have a higher-level of separation between tenants and one option involved a secondary pair of NSX Edge devices for each tenant. The following diagram describes the logical approach.

OpenStack NSX

The above logical representation expresses how the deployed tenant NSX Edges are connected upstream to the security or L3 network boundary and downstream to the OpenStack Project that is created for each tenant. At a small to medium scale, I believe this model works operationally — the tenant NSX Edges create a logical separation between each OpenStack tenant and (if assigned to on a per-team basis) should remain relatively manageable as the environment scales to dozens of tenants.

The NSX Edge devices are deployed inside the very same OpenStack Edge cluster specified during the VIO deployment, however they are being managed outside of OpenStack and the tenant has no control over them. Each secondary pair of NSX Edge devices for the tenant are configured with two interfaces — one uplink to the north-south NSX Edge security gateway and a single interface that becomes the external network inside OpenStack. It is this internal interface that the OpenStack tenant deploys all of their own micro-segmented networks and consumes floating IP addresses from.

An upcoming post will describe the deployment and configuration of these tenant NSX Edge devices and their corresponding configuration inside OpenStack.

I am concerned with this approach once the scale of the environment grows to 50+ OpenStack tenants. It is at that point where operational management difficulties will begin to surface and cause challenges. Specifically, the lack of available automation for creating, linking and configuring both the NSX component deployment and tenant creation within OpenStack itself (projects, users, role-based access, external networks, subnets, etc).


After writing the initial article in the post, further testing was performed that has influenced how multi-tenant OpenStack external networks can be implemented. As such, Part 2 of the series includes the new information. For completeness, I’ve decided to include that information here as well for archival purposes.

Part 2

The post yesterday discussed a method for having segmented multi-tenant networks inside of OpenStack. As a series of test cases were worked through with a setup of this nature, a large gaping hole in OpenStack came into view.

What does the previously described multiple external networks look like inside OpenStack?

Admin view with multiple external networks defined in OpenStack.
Networks from Tenant 1 view.
Networks from Tenant 2 view.

In the second and third screenshots, you can see the two tenants see both external networks, but they only see a subnet listed for the external network that was created with their respective tenant-id. At first glance, this would seem to be doing what was intended — each tenant receiving their own external network to consume floating IP addresses from. Unfortunately, it begins to breakdown when a tenant goes to Compute –> Access & Security –> Floating IPs in Horizon.

Multiple tenant floating IP addresses.

The above screenshot shows a tenant being assigned an floating IP address from what should have been an external network they did not have access to.

I felt pretty much like Captain Picard after working through the test cases. Surely, OpenStack would allow a design where tenants have segmented external networks — right?

Unfortunately, OpenStack does not honor this type of segmented external networking design — it will allow any tenant consume/claim a floating IP address from any of the other external networks. To read how OpenStack fully implements external networks, you can read the documentation here. At issue here is highlighted here,



Nevertheless, the concept of ‘external’ implies some forms of sharing, and this has some bearing on the topologies that can be achieved. For instance it is not possible at the moment to have an external network which is reserved to a specific tenant.

Essentially, OpenStack Neutron thinks of external networks differently, than I believe most architects. It also does not clearly honor the tenant-id attribute that is specified when the network is created, nor when the shared attribute is not enabled on the external network. The methodology OpenStack Neutron uses is more in-line with the AWS consumption model — everyone drinks from the same pool and there is no segmentation between the tenants. I personally do not believe that model works in a private cloud where there are multiple tenants.

The next post in the series will discuss a potential design for working around the issue inside OpenStack Neutron.

VMware Integrated OpenStack – Collapse Compute & Edge Clusters


VMware Integrated OpenStack (VIO) introduced the ability to deploy to multiple vCenter Servers with version 2.5. The feature allowed the OpenStack management VMs to be deployed inside a control plane vCenter, while allowing the data plane to use a separate vCenter server. The architecture model still required three clusters:

  • Management Cluster (Management vCenter Server)
  • Compute Cluster(s) (Workload vCenter Server)
  • Edge Cluster (Workload vCenter Server)

The three cluster architecture follows the published best practices from both VIO and NSX. Having a dedicated Edge cluster should free up tenant resources and prevent potential issues with network noisy-neighbors. However, having a dedicated cluster just for NSX Edge VMs could be overkill in some environments from both a cost and compute perspective. If you are also using Virtual SAN to leverage hyper-converged infrastructure (HCI), the cost increases considerably with licensing costs for vSphere, NSX and Virtual SAN for hosts that will be extremely under-utilized.

So how can you collapse the compute and edge clusters in a VMware Integrated OpenStack environment?

In version 3.0 there is a configuration change that makes it possible to collapse these two cluster. Performing the following steps will allow you to deploy a smaller footprint OpenStack deployment using VIO.

$ sudo vim /opt/vmware/vio/etc/

Add the following lines to the end of the configuration file:

## Collapse the Edge/Compute clusters
oms.allow_shared_edge_cluster = true

Restart the OMS services
$ sudo restart oms

Once the OMS services have been restarted, the VIO Deployment UI will now allow you to deploy the Edge VMs inside the same Compute cluster on the control plane vCenter Server instance.

A couple caveats to this approach to be aware of:

  • All tenant deployed Edge VMs will live in the collapsed Edge/Compute cluster. As the environment scales to include multiple compute clusters, only this initial Edge/Compute cluster will have the Edge VMs.
  • The OpenStack Horizon UI is unaware of these tenant deployed Edge VMs, so reporting on utilization within the compute cluster is shown on the screen, the rates will have discrepancies — depending on how large the environment is.

Your mileage may vary, but this option allows for some additional flexibility when deploying VMware Integrated OpenStack.

Deploying a VMware NSX Controller through the API


The current iteration of my home lab includes three Intel NUC DCCP847DYE systems, each with a single dual-core CPU and 16GB of RAM. As I started to venture down the path of introducing VMware NSX into the lab, the limitations of these systems when deploying a cluster of NSX controllers became apparent. The default size that NSX Manager uses for an NSX Controller (when deploying a controller through the Web Client) is a medium-sized VM (4 vCPU, 4GB RAM). The deployment of the VM was always successful, but the workflow failed when the system went to power on the new NSX Controller. At first, I thought that meant I was stuck, but after a conversation with a co-worker he mentioned using the API to deploy a smaller-sized VM.

After leaving it be for a few weeks, I got back to the deployment this week — I also happen to be taking the ICM course for NSX as well — so the timing was right. A quick Google search for the NSX 6.2 API guide turned up the necessary documentation.

Looking at example 2-3 in the API documentation, building out the XML raw data for the creation of a new NSX controller was the first step. The final product for my environment included the following XML:

 1  <controllerSpec>
 2   <name>nsx-controller-node1</name>
 3   <description>nsx-controller</description>
 4   <ipPoolId>ipaddresspool-1</ipPoolId>
 5   <resourcePoolId>resgroup-29</resourcePoolId>
 6   <hostId>host-22</hostId>
 7   <datastoreId>datastore-10</datastoreId>
 8   <deployType>small</deployType>
 9   <networkId>dvportgroup-541</networkId>
10   <password>12-char-password</password>
11 </controllerSpec>

The interesting pieces here are lines 4-8 and line 9. Chris Wahl had a good post that provided some guidance on the process as well — that I recommend reading — from about 18 months ago. The biggest challenge was using the MOB to determine the object names for the <resourcePoolId>, <hostId>, <datastoreId> and <networkId>. If you are unfamiliar with the MOB, access to it is granted via the /mob URL for your vCenter Server (i.e. https://vcenter.local.domain/mob).

The other interesting piece I learned (and Tweeted about) was the identifier for the IP Address Pool has been changed since the API guide was written. The string used for the <ipPoolId> is ipaddresspool-X, where X is an integer starting at the value of 1. So in the case of my environment, I only have a single IP Address Pool created, so the correct value to use was ipaddresspool-1.

Once you have the XML code built for your unique environment, sending it to the NSX API was simple enough. I used the CocoaRestClient on my MBP, entering the URL for the controller and setting it to POST:


Watching the Response Body window, after the call is successful, a job number will be displayed. It will look something like jobdata-713. The job number can be used to monitor the progress of the deployment through the API as well.


You can use the same XML code in the API call over and over again for NSX to create additional NSX Controller VMs. It will assign a unique identifier to each one and pull IP Addresses for each one from the pool you specified in the XML payload.

Progress can also be monitored through the vSphere Web Client. Once the deployment is complete, the Networking and Security UI will show the new NSX Controller(s).

NSX controller

Being able to use the API to deploy the NSX Controllers was a great way to get it working in the limited lab environment my Intel NUCs are currently providing. If you found this post helpful, please let me know over Twitter (@chrismutchler).