Multi-tenant OpenStack with NSX – Part 1

openstack-nsx

I have been working on an OpenStack architecture design using VMware Integrated OpenStack (VIO) for the past several months. The design itself is being developed for an internal cloud service offer and is the design for my VCDX certification pursuit in 2017. As the design has gone through the Proof-of-Concept and later the Pilot phases, determining how to offer a multi-tenant/personal cloud offering has presented itself to be challenging. The design relies heavily on the NSX software defined networking (SDN) platform, both because of requirements from VIO and service offering requirements. As such, all north-south traffic goes through a set of NSX Edge devices. Prior to hitting the north-south boundary however, the east-west tenant traffic needed to be addressed.

I had seen other designs where a single, large (/22 or greater) external network was relied upon by all OpenStack tenants talked about. However, for a personal cloud or multi-tenant cloud offering based on OpenStack, I felt this was not a good design choice for this environment. I wanted to have a higher-level of separation between tenants and one option involved a secondary pair of NSX Edge devices for each tenant. The following diagram describes the logical approach.

OpenStack NSX

The above logical representation expresses how the deployed tenant NSX Edges are connected upstream to the security or L3 network boundary and downstream to the OpenStack Project that is created for each tenant. At a small to medium scale, I believe this model works operationally — the tenant NSX Edges create a logical separation between each OpenStack tenant and (if assigned to on a per-team basis) should remain relatively manageable as the environment scales to dozens of tenants.

The NSX Edge devices are deployed inside the very same OpenStack Edge cluster specified during the VIO deployment, however they are being managed outside of OpenStack and the tenant has no control over them. Each secondary pair of NSX Edge devices for the tenant are configured with two interfaces — one uplink to the north-south NSX Edge security gateway and a single interface that becomes the external network inside OpenStack. It is this internal interface that the OpenStack tenant deploys all of their own micro-segmented networks and consumes floating IP addresses from.

An upcoming post will describe the deployment and configuration of these tenant NSX Edge devices and their corresponding configuration inside OpenStack.

I am concerned with this approach once the scale of the environment grows to 50+ OpenStack tenants. It is at that point where operational management difficulties will begin to surface and cause challenges. Specifically, the lack of available automation for creating, linking and configuring both the NSX component deployment and tenant creation within OpenStack itself (projects, users, role-based access, external networks, subnets, etc).


Update

After writing the initial article in the post, further testing was performed that has influenced how multi-tenant OpenStack external networks can be implemented. As such, Part 2 of the series includes the new information. For completeness, I’ve decided to include that information here as well for archival purposes.

Part 2

The post yesterday discussed a method for having segmented multi-tenant networks inside of OpenStack. As a series of test cases were worked through with a setup of this nature, a large gaping hole in OpenStack came into view.

What does the previously described multiple external networks look like inside OpenStack?

admin-external-networks
Admin view with multiple external networks defined in OpenStack.
tenant1-external-networks
Networks from Tenant 1 view.
tenant2-external-networks
Networks from Tenant 2 view.

In the second and third screenshots, you can see the two tenants see both external networks, but they only see a subnet listed for the external network that was created with their respective tenant-id. At first glance, this would seem to be doing what was intended — each tenant receiving their own external network to consume floating IP addresses from. Unfortunately, it begins to breakdown when a tenant goes to Compute –> Access & Security –> Floating IPs in Horizon.

floating-ips
Multiple tenant floating IP addresses.

The above screenshot shows a tenant being assigned an floating IP address from what should have been an external network they did not have access to.
facepalm

I felt pretty much like Captain Picard after working through the test cases. Surely, OpenStack would allow a design where tenants have segmented external networks — right?

Unfortunately, OpenStack does not honor this type of segmented external networking design — it will allow any tenant consume/claim a floating IP address from any of the other external networks. To read how OpenStack fully implements external networks, you can read the documentation here. At issue here is highlighted here,

 

 

Nevertheless, the concept of ‘external’ implies some forms of sharing, and this has some bearing on the topologies that can be achieved. For instance it is not possible at the moment to have an external network which is reserved to a specific tenant.

Essentially, OpenStack Neutron thinks of external networks differently, than I believe most architects. It also does not clearly honor the tenant-id attribute that is specified when the network is created, nor when the shared attribute is not enabled on the external network. The methodology OpenStack Neutron uses is more in-line with the AWS consumption model — everyone drinks from the same pool and there is no segmentation between the tenants. I personally do not believe that model works in a private cloud where there are multiple tenants.

The next post in the series will discuss a potential design for working around the issue inside OpenStack Neutron.

Deploying a VMware NSX Controller through the API

NSX

The current iteration of my home lab includes three Intel NUC DCCP847DYE systems, each with a single dual-core CPU and 16GB of RAM. As I started to venture down the path of introducing VMware NSX into the lab, the limitations of these systems when deploying a cluster of NSX controllers became apparent. The default size that NSX Manager uses for an NSX Controller (when deploying a controller through the Web Client) is a medium-sized VM (4 vCPU, 4GB RAM). The deployment of the VM was always successful, but the workflow failed when the system went to power on the new NSX Controller. At first, I thought that meant I was stuck, but after a conversation with a co-worker he mentioned using the API to deploy a smaller-sized VM.

After leaving it be for a few weeks, I got back to the deployment this week — I also happen to be taking the ICM course for NSX as well — so the timing was right. A quick Google search for the NSX 6.2 API guide turned up the necessary documentation.

Looking at example 2-3 in the API documentation, building out the XML raw data for the creation of a new NSX controller was the first step. The final product for my environment included the following XML:

 1  <controllerSpec>
 2   <name>nsx-controller-node1</name>
 3   <description>nsx-controller</description>
 4   <ipPoolId>ipaddresspool-1</ipPoolId>
 5   <resourcePoolId>resgroup-29</resourcePoolId>
 6   <hostId>host-22</hostId>
 7   <datastoreId>datastore-10</datastoreId>
 8   <deployType>small</deployType>
 9   <networkId>dvportgroup-541</networkId>
10   <password>12-char-password</password>
11 </controllerSpec>

The interesting pieces here are lines 4-8 and line 9. Chris Wahl had a good post that provided some guidance on the process as well — that I recommend reading — from about 18 months ago. The biggest challenge was using the MOB to determine the object names for the <resourcePoolId>, <hostId>, <datastoreId> and <networkId>. If you are unfamiliar with the MOB, access to it is granted via the /mob URL for your vCenter Server (i.e. https://vcenter.local.domain/mob).

The other interesting piece I learned (and Tweeted about) was the identifier for the IP Address Pool has been changed since the API guide was written. The string used for the <ipPoolId> is ipaddresspool-X, where X is an integer starting at the value of 1. So in the case of my environment, I only have a single IP Address Pool created, so the correct value to use was ipaddresspool-1.

Once you have the XML code built for your unique environment, sending it to the NSX API was simple enough. I used the CocoaRestClient on my MBP, entering the URL for the controller and setting it to POST:

https://nsxmanager.local.domain/api/2.0/vdn/controller

Watching the Response Body window, after the call is successful, a job number will be displayed. It will look something like jobdata-713. The job number can be used to monitor the progress of the deployment through the API as well.

https://nsxmanager.local.domain/api/2.0/vdn/controller/progress/jobdata-713

You can use the same XML code in the API call over and over again for NSX to create additional NSX Controller VMs. It will assign a unique identifier to each one and pull IP Addresses for each one from the pool you specified in the XML payload.

Progress can also be monitored through the vSphere Web Client. Once the deployment is complete, the Networking and Security UI will show the new NSX Controller(s).

NSX controller

Being able to use the API to deploy the NSX Controllers was a great way to get it working in the limited lab environment my Intel NUCs are currently providing. If you found this post helpful, please let me know over Twitter (@chrismutchler).

IGMP, Multicast and learning a lesson

In a recent conversation, it became clear to me that my knowledge of the inner workings of VXLAN and VSAN were not a deep as they could be. Since I am also studying for my VCAP exams, I knew additional time educating myself around these two technologies was a necessity. As a result, I’ve spent the last day diving into the IGMP protocol, multicast traffic and how they are utilized both within VXLAN and VMware VSAN. I wanted to capture what I’ve learned on a blog post as much for myself as for anyone else who might be interested in the subject. Writing what I’ve learned is one way I can absorb and retain information long-term.

IGMP

IGMP is a layer 3 network protocol. It is a communications protocol use to establish multicast group memberships. It is encapsulated within an IP packet and does not use a transport layer — similar to ICMP. It is also used to register a router for receiving multicast traffic. There are two important pieces within the IGMP protocol that VXLAN and VSAN take advantage of — IGMP Querier and IGMP Snooping. Without these two pieces, IGMP would act astonishing more than a broadcast transmission and lack the efficiency required.

IGMP Querier

The IGMP Querier is the router or switch that acts as the master for the IGMP filter lists. It will check and track membership by sending queries on a timed interval.

IGMP Snooping

On a layer 2 switch, IGMP Snooping allows for the passive monitoring for IGMP packets sent between router(s) and host(s). It also does not send any additional network traffic across the wire, making it more efficient for multicast traffic passing through the network.

VXLAN

That’s IGMP in a nutshell — so how is it used in VXLAN?

In order for VXLAN to act as an overlay network, multicast traffic is used to enable the L2 over L3 connectivity — effectively spanning the entire logical network VXLAN has defined. When a virtual machine connects to a VXLAN logical network, it behaves as though everything is within a single broadcast domain. The ESXi hosts, configured for VXLAN, register themselves as VTEPs. Only those VTEPs that register with the VXLAN logical network participate in the multicast broadcasts. This is accomplished through IGMP Snooping and IGMP Querier. If you have 1000 ESXi hosts configured for VXLAN, but only a subset (say 100) of the hosts are concerned for a specific VXLAN logical network, you wouldn’t want to send multicast broadcasts out to all 1000 ESXi hosts — that would be inefficient by increasing the multicast traffic on the network unnecessarily.

There is a really good VMware Blog 4-part series on VXLAN and how it operates here.

VMware VSAN

The implementation for VSAN is very similar to that of VXLAN. The VSAN clusters require a methodology for learning what ESXi hosts are adjacent to each other and participating as a VSAN cluster. VMware uses layer 2 multicast traffic for the host discovery within VSAN.

Once again, IGMP Querier and IGMP Snooping are play a beneficial role. VMware states that implementing multicast flooding is not a best practice. By leveraging both IGMP Snooping and IGMP Querier, VSAN is able to understand who wants to participate within the multicast group. This is particularly beneficial when multiple network devices exist on the same VLAN that VSAN is operating on.

If you have multiple VSAN clusters operating on the same VLAN, it is recommended you change the broadcast address for the multicast traffic so they are not identical. This will prevent one VSAN cluster from receiving another clusters broadcasts. It can also help prevent the Misconfiguration detected error under the Network status sections of a VSAN cluster.

For a better understanding of how VSAN operates, please check out the VMware blog entry here.

For a season network professional, I highly doubt any of this was new or mind-blowing. For someone who does not generally dive into the various network protocols — but should probably start doing so — this information was both a good refresher on IGMP and helped me understand both VXLAN and VSAN a bit better.

Did I get something wrong? Let me know on Twitter.

Thoughts on the VMware Project Photon Announcement

project photon

VMware announced a new open source project call Project Photon today. The full announcement call be seen here. Essentially Project Photon is a lightweight Linux operating system built to support Docket and rkt (formerly Rocket) containers. The footprint is less than 400MB and can run containers immediately upon instantiation. I had heard rumors the announcement today was going to include some sort of OS, but I was not very excited about it until I started reading the material being released prior to the launch event in a few hours.

Having seen the demo and read the material, my mind went into overdrive for the possibilities both open source projects offer organizations who are venturing down the Cloud Native Apps (or Platform 3) road. I believe VMware has a huge opportunity here to cement themselves as the foundation for running robust, secure and enterprise-ready Cloud Native Applications. If you think about the performance gains vSphere 6.0 has provided, and then look at how they are playing in the OpenStack space with both VIO and NSX, the choice becomes obvious.

The area of focus now needs to be on tying all of the pieces together to offer organizations an enterprise-class end-to-end Platform-as-a-Service solution. This is where, I believe, the VMware Big Data Extensions framework should play an enormous part. The framework already allows deployment of Hadoop, Mesos and Kubernetes clusters. Partner the framework with Project Photon and you now have a minimal installation VM that can be launched within seconds with VMFork. From there, the resource plugin Virtual Elephant launched today could be mainstreamed (and improved) to allow for the entire deployment of a Mesos stack, backed by Project Photon, through the open source API OpenStack offers with Heat.

Epic win!

There is still work VMware could do with the Big Data Extensions framework to improve its capabilities, especially with newcomers SequenceIQ and their Cloudbreak offering stiff competition. Expanding BDE to be able to deploy clusters beyond an internal vSphere environment, but also to the major public cloud environments — including their own vCloud Air — will be key going forward. The code for BDE is already an open source project — by launching these two new open source projects they are showing the open source community they are serious.

This is a really exciting time in virtualization and I just got even more excited today!