VMware Cloud Foundation Configuration Bits

vcf_poster_snap

The VMware Cloud Foundation (VCF) platform automates the deployment and lifecycle management of the SDDC. The deployment will help an organization go from installation of the physical rack to a ready-for-deployment vSphere environment in a matter of hours. The VCF platform includes the following VMware products:

  • VMware vSphere Hypervisor
  • VMware vCenter Server
  • VMware NSX-v
  • VMware vRealize Operations
  • VMware vRealize Log Insight

As the previous post mentioned, there are several management components VCF relies upon for its automation and workflow framework. After the initial deployment is complete, a vSphere Administrator will still need to perform several tasks to fully configure the environment and make it ready for a production workload. Some of those steps include:

  • Configuring LDAP or Active Directory authentication sources.
  • Creating local accounts.
  • Configuring the network uplinks on the physical network equipment.
  • Configuring NSX and/or the Virtual Distributed Switch for upstream network connectivity.
  • Configuring a jump host for accessing the OOB network where the iDRAC interfaces exists.
    • Multiple jump hosts will be required, one for each physical rack since the OOB network is duplicated within each rack.
  • NIOC will need to be configured.
  • Proper configuration of the Resource Pools VCF creates will need to be completed — no reservations or shares exist after initial deployment.
  • Log Insight management packs, where necessary, will need to be configured.
  • vRealize Operations will need to be configured.
  • DNS integration.
  • Adjust the Virtual SAN storage policies per your environments requirements.

A few key points to remember,

  • Do not modify the cluster structure outside the VRM workflows — which means no creating new clusters or splitting existing clusters up.
  • Do not modify the names of any of the management virtual machines.
  • Do not modify the name of the Virtual Distributed Switches.
  • Do not modify the pre-configured portgroup names.
  • All expansion of hosts/capacity needs to be initiated from the VRM interface.
  • The management cluster will only deploy initially with 3 nodes — barely enough for any true fault tolerance for Virtual SAN. I highly encourage you to expand it to the recommended VMware Best Practice of a 4 hosts.
  • Upgrades always occur in the management cluster first, then the workload domains — which I personally believe to be a bit backwards.

The VCF product is a great first step along the path of fully automated deployments and lifecycle management. The biggest challenge to adopting it will be balancing the line between what VCF manages and what a typical vSphere Administrator is going to be used to doing. Operationally it will take some adjustment, especially when using the lifecycle management workflows for the first time.

Happy Thanksgiving!

VMware Cloud Foundation Management Stack

love-o-land

The VMware Cloud Foundation platform includes a management stack, which includes several virtual appliances for handling the automation and SDDC lifecycle management of the platform. The virtual appliances will be deployed inside the management cluster before the VCF rack is shipped to the customer.

The VCF management stack include:

  • vRack Manager (vRM): 4 vCPU, 12GB Memory, Disk1 100GB, Disk2 50GB
  • vRack-ISVM[1-3]: 4 vCPU, 4 GB Memory, Disk1 20GB
  • vRack-LCM_Repository: 4 vCPU, 4GB Memory, Disk1 200GB, Disk2 900GB
  • vRack-LCM_Backup_Repository: 4 vCPU, 4GB Memory, Disk1 30GB

The following diagram is a logical representation of the VCF management stack.

vmware cloud foundation management

The platform creates a private, non-routable VLAN within the network to communication between the management virtual appliances. This network is used for all of the backend network communication.

These VMs are managed by the VCF platform and should not be modified in any manner. Modifying the VMs in any manner will have dire consequences on the platform and may prevent it from operating as intended.

The vRack Manager is the primary interface the user will deal with. The GUI HTTP server runs on this virtual appliance, including all of the workflows for creating, modifying and deleting workload domains within VCF. I have seen the VRM become unresponsive entirely or the GUI become painfully slow when interacting with it — a reboot of the virtual appliance has solved these issues when they have occurred.

In addition to the VCF management stack, the first rack of VCF hardware will also include vCenter Server, a Platform Services Controller, NSX Manager, a set of NSX Controllers, vRealize Operations virtual appliance and a vRealize Log Insight virtual appliance. It is important to take these virtual appliances into consideration when performing sizing calculations for the environment.

This management stack only exists on the first management cluster created on the first physical rack of VCF. The additional management stacks created in subsequent physical racks will leverage these same virtual appliances for their operations. The additional management clusters will only include a VRM, vCenter Server, NSX Manager and NSA controllers.

Update: VMware released a VMware Cloud Foundation Architecture Poster, which can be downloaded here in PDF form.

Note – this information is accurate as of November 2016 and may be subject to change based on future VMware Cloud Foundation platform changes.

NSX DLR Designated Instance

nsx designated instance

While a great show, we are going to talk about something slightly different — the NSX Distributed Logical Router (DLR) Designated Instance. NSX has many great features and also many caveats when implementing some of those great features — like having a Designated Instance when using a DLR.

So what is a Designated Instance? Honestly, I did not know what it was until a conversation earlier today with a few co-workers who are a bit more knowledgable with NSX than me. Essentially a Designated Instance is an elected ESXi host that will answer all new requests initially — also known as a single-point of failure.

Let’s look at the logical network diagram I posted yesterday.

nsx-dlr-openstack

Pretty sweet right?

The issue is when the DLR is connected directly to a VLAN. While technically not a problem — it does exactly what you’d expect it does — it results in having to have one of the ESXi hosts in the transport zone act as the Designated Instance. The result is that if the Designated Instance ESXi host encounters a failure, any new traffic will fail until the election process is complete and a new Designated Instance is chosen.

So is it possible to not need a Designated Instance when using a DLR? Yes.

It involves introducing another logical NSX layer into the virtual network design. If you saw my tweet earlier, this is what I meant.

I like , but sometimes I think it adds a little too much complexity for operational simplicity.

Adding a set of ECMP edges above the DLR and connecting the two together will eliminate the requirement for NSX to use a the Designated Instance. Here is what an alternative to the previous design would look like.

external openstack

Essentially what I’ve done is create another VXLAN, with a corresponding NSX Logical Switch and connect the uplink from the DLR to it. Then the ECMP Edges use the same Logical Switch as their internal interface. It is on the uplink side of the ECMP Edge where the P2V layer takes place and the VLAN is connected.

Using this design allows the environment to use a dynamic routing protocol between both the DLR and ECMP Edges and ECMP Edges and the upstream physical network — although mileage may vary depending on your physical network. The ECMP Edges introduce additional scalability — although limited to 8 — based on the amount of North-South network traffic and the bandwidth required to meet the tenant needs. Features like vSphere Anti-Affinity rules can mitigate a failure of a single ESXi host, which you cannot do when there is a Designated Instance. The design can also take into consideration a N+x scenario for when to scale the ECMP Edges.

So many options open up when NSX is introduced into an architecture, along with a lot of extra complexity. Ultimately the decision should be based on the requirements and the stakeholders risk acceptance. Relying on a Designated Instance may be acceptable to a stakeholder, while adding more complexity to the design may not be.

Until next time, enjoy!

Multi-tenant OpenStack with NSX – Part 3

design-sla-banner

This next post in the series about multi-tenant OpenStack with NSX will discuss the use of a Distributed Logical Router as the bridge between OpenStack and the physical network. If you have not read the previous posts in the series, you can catch up by reading this one.

Originally the plan had been to segment each OpenStack tenant off with their own HA-pair of NSX Edges. However, after discovering that OpenStack does not honor the tenant-id parameter, nor the disabling the Shared parameter within the external network object and adjustment had to be made. Working through the problem it became clear that a NSX Distributed Logical Router (DLR) could be leveraged and would also scale as the environment grows beyond a few dozen tenants. The new multi-tenant network design for OpenStack now looks like this:

nsx-dlr-openstack

The logical diagram shows how the uplink of the DLR is the upstream (north-south) boundary for the environment. The internal interface on the DLR is the external OpenStack network, and is leveraging VXLAN to provide the floating IP addresses the OpenStack tenants will consume. If you are unfamiliar with how a DLR operates, I recommend reading this post from Roie Ben Haim on his blog.

Basically, the DLR relies upon two components — the control VMs and a kernel module inside vSphere to inject routing on each ESXi host within the NSX transport zone. It is the inclusion of this kernel module on every ESXi host that will allow this virtual network design to scale as the environment does. In the previous design, the individual NSX Edges would merely be deployed in an HA-pair and only the active VM would be handling all of the traffic for that particular tenant. With the DLR, although all tenant traffic will be going through the single layer, that layer is distributed across the entire workload environment.

After the DLR is created, the routes can be see within each ESXi host, as shown in the following image.

ext_openstack_dlr19

The downside that still remains is the shared pool of IP addresses for all tenants to consume from. Operationally it will mean having to manage the tenant quotas for floating IP addresses and making sure there is no over-allocation. I would still like to see the OpenStack community take on the extra work of honoring the tenant-id parameter when creating an external network within OpenStack so that the option would exist to have individual tenant floating IP address pools.

Tuesday’s post will include detailed instructions on how to deploy and configure both the NSX portion of this deployment and the OpenStack pieces to tie it all together. Enjoy!

MIT OpenCourseWare

mit opencoursweare

I am a member of the last generation that remembers what it was like before the Internet took over nearly all aspects of our lives. I remember cassette players, the Walkman, 8-bit video games and going outside from early morning to after sundown without an electronic device anywhere. My kids — I have six — are growing up in a completely different world from the one my wife and I grew up in. I learned about computers in the very early 1990s, when a i386 40MHz processor was screaming fast and a 4 Mb of memory was enormous! Everything I’ve learned about computers has been self-taught, from manually building a computer, to writing a program in Pascal, to building large-scale private cloud infrastructures. I have wanted to pass on that same passion for computers and technology to my children, but it has been a challenge to find a way to do so.

My 12-year son is a Boy Scout and has recently asked me to help him with the Programming Merit Badge. Finally! As I went through the requirements, I quickly realized it was going to much more challenging than I anticipated. But I am determined to expose him to all the things I love about computers and my career.

Fortunately I found MIT’s OpenCourseWare program. They have over 2300 courses online to help anyone learn a new subject or expand their knowledge in one they already know. Specifically they have an introduction course to programming, which covers Python and is geared to those with no programming experience. I will be going through the course with my son (and daughter if I can convince her) to help them learn how to program. And when I say “with”, I mean that literally. The first programming language I was exposed to was Basic, with the first course I ever took in high school used Pascal — who else remembers learning Pascal? Anyway, I became a huge fan of Perl in the late 1990s and never looked back — sure, I’ve dabbled in Python and Ruby, but I never spent the time to seriously learn them. So I will be learning Python right alongside my kids and hopefully fostering a love of computer science at the same time.

Continual education is something all of us should be doing. Over the years we have all probably taken a certification or training class — maybe even finished up our Bachelor’s degree or Graduate degree. Whatever you interests are, I encourage you to constantly be learning, be reading and sharing that passion and love with those around you — especially your children.

Their generation has never known what it was like to use a rotary phone, or to dial 411 to look up a phone number. The things we had to learn and challenge ourselves with they often take for granted. We are the generation that had to write the programming languages the apps they install on their cell phones and tablets use!

May we remember that passion and may we help them to have the same passion that drove so many of us in this great community. God bless.