Troubleshoot OpenStack Neutron + NSX


OpenStack Neutron likes to use some pretty awesome reference IDs for the tenant network objects. You know, helpful strings like ec43c520-bfc6-43d5-ba2b-d13b4ef5a760. The first time I saw that, I said to myself that is going to be a nightmare when trying to troubleshoot an issue.


Fortunately, VMware NSX also uses a similar character string when it creates logical switches. If NSX is being used in conjunction with OpenStack Neutron, magic happens. The logical switch is created with a string like vxw-dvs-9-virtualwire-27-sid-10009-ec43c520-bfc6-43d5-ba2b-d13b4ef5a760.


A keen eye will have noticed the OpenStack Neutron reference ID is included in the NSX logical switch name!

From there you can reference the NSX Edge virtual machines and see which interface the NSX logical switch is attached to. This tidbit of information proved useful today when I was troubleshooting an issue for a developer and is a piece of information going into my VCDX SOP document.



Deploying NSX Virtual Network for OpenStack


Continuing the OpenStack + NSX series (Part 1, Part 2 and Part 3) on deploying a multi-tenant OpenStack environment that relies upon NSX, this post will cover the details of the deployment and configuration.

There have been a couple options discussed through the series, including the logical graphic with that relies on a NSX DLR w/o ECMP Edges:


Or a logical virtual network design with a DLR and ECMP Edges:

external openstack

Regardless of which virtual network design you choose, the configuration of the NSX Distributed Logical Router and the tie into OpenStack will need to be configured. In the course of building out a few VMware Integrated OpenStack labs, proof-of-concepts and pilot environments, I’ve learned a few things.

Rather than go through all 30+ steps to implement the entire stack, I want to simply highlight a few keys points. When you configure the DLR, you should end up with two interfaces — an uplink to either the ECMP layer or the physical VLAN and an internal interface to the OpenStack external VXLAN network.


Once the DLR is deployed, you can log into any of the ESXi hosts within the NSX transport zone and verify the routes are properly in place with a few simple CLI commands.


The implementation of tying the NSX components into OpenStack is now ready to be completed. I prefer to use the API method, using the neutron CLI — log into the VIO management server and then either of the Controller VMs.



Key points to remember here:

  • The physical_network parameter is the just the virtualwire-XX string from the NSX-created portgroup.
  • The name for the network must exactly match the NSX Logical Switch that was created for the OpenStack external network.

The commands I used here to create the network inside OpenStack:

$ source <cloudadmin_v3>
$ neutron net-list
$ neutron net-create --provider:network_type=portgroup --provide:physical_network=virtualwire-XX nsx_logical_switch_name
$ neutron net-list

All that remains is adding a subnet to the external network inside OpenStack, which can be performed through the Neutron CLI or the Horizon UI. All-in-all it is a pretty easy implementation, just make sure you remember to reference the proper the object names in NSX when creating the OpenStack network objects.


NSX DLR Designated Instance

nsx designated instance

While a great show, we are going to talk about something slightly different — the NSX Distributed Logical Router (DLR) Designated Instance. NSX has many great features and also many caveats when implementing some of those great features — like having a Designated Instance when using a DLR.

So what is a Designated Instance? Honestly, I did not know what it was until a conversation earlier today with a few co-workers who are a bit more knowledgable with NSX than me. Essentially a Designated Instance is an elected ESXi host that will answer all new requests initially — also known as a single-point of failure.

Let’s look at the logical network diagram I posted yesterday.


Pretty sweet right?

The issue is when the DLR is connected directly to a VLAN. While technically not a problem — it does exactly what you’d expect it does — it results in having to have one of the ESXi hosts in the transport zone act as the Designated Instance. The result is that if the Designated Instance ESXi host encounters a failure, any new traffic will fail until the election process is complete and a new Designated Instance is chosen.

So is it possible to not need a Designated Instance when using a DLR? Yes.

It involves introducing another logical NSX layer into the virtual network design. If you saw my tweet earlier, this is what I meant.

I like , but sometimes I think it adds a little too much complexity for operational simplicity.

Adding a set of ECMP edges above the DLR and connecting the two together will eliminate the requirement for NSX to use a the Designated Instance. Here is what an alternative to the previous design would look like.

external openstack

Essentially what I’ve done is create another VXLAN, with a corresponding NSX Logical Switch and connect the uplink from the DLR to it. Then the ECMP Edges use the same Logical Switch as their internal interface. It is on the uplink side of the ECMP Edge where the P2V layer takes place and the VLAN is connected.

Using this design allows the environment to use a dynamic routing protocol between both the DLR and ECMP Edges and ECMP Edges and the upstream physical network — although mileage may vary depending on your physical network. The ECMP Edges introduce additional scalability — although limited to 8 — based on the amount of North-South network traffic and the bandwidth required to meet the tenant needs. Features like vSphere Anti-Affinity rules can mitigate a failure of a single ESXi host, which you cannot do when there is a Designated Instance. The design can also take into consideration a N+x scenario for when to scale the ECMP Edges.

So many options open up when NSX is introduced into an architecture, along with a lot of extra complexity. Ultimately the decision should be based on the requirements and the stakeholders risk acceptance. Relying on a Designated Instance may be acceptable to a stakeholder, while adding more complexity to the design may not be.

Until next time, enjoy!

Multi-tenant OpenStack with NSX – Part 3


This next post in the series about multi-tenant OpenStack with NSX will discuss the use of a Distributed Logical Router as the bridge between OpenStack and the physical network. If you have not read the previous posts in the series, you can catch up by reading this one.

Originally the plan had been to segment each OpenStack tenant off with their own HA-pair of NSX Edges. However, after discovering that OpenStack does not honor the tenant-id parameter, nor the disabling the Shared parameter within the external network object and adjustment had to be made. Working through the problem it became clear that a NSX Distributed Logical Router (DLR) could be leveraged and would also scale as the environment grows beyond a few dozen tenants. The new multi-tenant network design for OpenStack now looks like this:


The logical diagram shows how the uplink of the DLR is the upstream (north-south) boundary for the environment. The internal interface on the DLR is the external OpenStack network, and is leveraging VXLAN to provide the floating IP addresses the OpenStack tenants will consume. If you are unfamiliar with how a DLR operates, I recommend reading this post from Roie Ben Haim on his blog.

Basically, the DLR relies upon two components — the control VMs and a kernel module inside vSphere to inject routing on each ESXi host within the NSX transport zone. It is the inclusion of this kernel module on every ESXi host that will allow this virtual network design to scale as the environment does. In the previous design, the individual NSX Edges would merely be deployed in an HA-pair and only the active VM would be handling all of the traffic for that particular tenant. With the DLR, although all tenant traffic will be going through the single layer, that layer is distributed across the entire workload environment.

After the DLR is created, the routes can be see within each ESXi host, as shown in the following image.


The downside that still remains is the shared pool of IP addresses for all tenants to consume from. Operationally it will mean having to manage the tenant quotas for floating IP addresses and making sure there is no over-allocation. I would still like to see the OpenStack community take on the extra work of honoring the tenant-id parameter when creating an external network within OpenStack so that the option would exist to have individual tenant floating IP address pools.

Tuesday’s post will include detailed instructions on how to deploy and configure both the NSX portion of this deployment and the OpenStack pieces to tie it all together. Enjoy!

Multi-tenant OpenStack with NSX – Part 2


The post yesterday discussed a method for having segmented multi-tenant networks inside of OpenStack. As a series of test cases were worked through with a setup of this nature, a large gaping hole in OpenStack came into view.

What does the previously described multiple external networks look like inside OpenStack?

Admin view with multiple external networks defined in OpenStack.
Networks from Tenant 1 view.
Networks from Tenant 2 view.

In the second and third screenshots, you can see the two tenants see both external networks, but they only see a subnet listed for the external network that was created with their respective tenant-id. At first glance, this would seem to be doing what was intended — each tenant receiving their own external network to consume floating IP addresses from. Unfortunately, it begins to breakdown when a tenant goes to Compute –> Access & Security –> Floating IPs in Horizon.

Multiple tenant floating IP addresses.

The above screenshot shows a tenant being assigned an floating IP address from what should have been an external network they did not have access to.

I felt pretty much like Captain Picard after working through the test cases. Surely, OpenStack would allow a design where tenants have segmented external networks — right?

Unfortunately, OpenStack does not honor this type of segmented external networking design — it will allow any tenant consume/claim a floating IP address from any of the other external networks. To read how OpenStack fully implements external networks, you can read the documentation here. At issue here is highlighted here,



Nevertheless, the concept of ‘external’ implies some forms of sharing, and this has some bearing on the topologies that can be achieved. For instance it is not possible at the moment to have an external network which is reserved to a specific tenant.

Essentially, OpenStack Neutron thinks of external networks differently, than I believe most architects. It also does not clearly honor the tenant-id attribute that is specified when the network is created, nor when the shared attribute is not enabled on the external network. The methodology OpenStack Neutron uses is more in-line with the AWS consumption model — everyone drinks from the same pool and there is no segmentation between the tenants. I personally do not believe that model works in a private cloud where there are multiple tenants.

The next post in the series will discuss a potential design for working around the issue inside OpenStack Neutron.