VMware Cloud Foundation Configuration Bits

vcf_poster_snap

The VMware Cloud Foundation (VCF) platform automates the deployment and lifecycle management of the SDDC. The deployment will help an organization go from installation of the physical rack to a ready-for-deployment vSphere environment in a matter of hours. The VCF platform includes the following VMware products:

  • VMware vSphere Hypervisor
  • VMware vCenter Server
  • VMware NSX-v
  • VMware vRealize Operations
  • VMware vRealize Log Insight

As the previous post mentioned, there are several management components VCF relies upon for its automation and workflow framework. After the initial deployment is complete, a vSphere Administrator will still need to perform several tasks to fully configure the environment and make it ready for a production workload. Some of those steps include:

  • Configuring LDAP or Active Directory authentication sources.
  • Creating local accounts.
  • Configuring the network uplinks on the physical network equipment.
  • Configuring NSX and/or the Virtual Distributed Switch for upstream network connectivity.
  • Configuring a jump host for accessing the OOB network where the iDRAC interfaces exists.
    • Multiple jump hosts will be required, one for each physical rack since the OOB network is duplicated within each rack.
  • NIOC will need to be configured.
  • Proper configuration of the Resource Pools VCF creates will need to be completed — no reservations or shares exist after initial deployment.
  • Log Insight management packs, where necessary, will need to be configured.
  • vRealize Operations will need to be configured.
  • DNS integration.
  • Adjust the Virtual SAN storage policies per your environments requirements.

A few key points to remember,

  • Do not modify the cluster structure outside the VRM workflows — which means no creating new clusters or splitting existing clusters up.
  • Do not modify the names of any of the management virtual machines.
  • Do not modify the name of the Virtual Distributed Switches.
  • Do not modify the pre-configured portgroup names.
  • All expansion of hosts/capacity needs to be initiated from the VRM interface.
  • The management cluster will only deploy initially with 3 nodes — barely enough for any true fault tolerance for Virtual SAN. I highly encourage you to expand it to the recommended VMware Best Practice of a 4 hosts.
  • Upgrades always occur in the management cluster first, then the workload domains — which I personally believe to be a bit backwards.

The VCF product is a great first step along the path of fully automated deployments and lifecycle management. The biggest challenge to adopting it will be balancing the line between what VCF manages and what a typical vSphere Administrator is going to be used to doing. Operationally it will take some adjustment, especially when using the lifecycle management workflows for the first time.

Happy Thanksgiving!

VMware Cloud Foundation Management Stack

love-o-land

The VMware Cloud Foundation platform includes a management stack, which includes several virtual appliances for handling the automation and SDDC lifecycle management of the platform. The virtual appliances will be deployed inside the management cluster before the VCF rack is shipped to the customer.

The VCF management stack include:

  • vRack Manager (vRM): 4 vCPU, 12GB Memory, Disk1 100GB, Disk2 50GB
  • vRack-ISVM[1-3]: 4 vCPU, 4 GB Memory, Disk1 20GB
  • vRack-LCM_Repository: 4 vCPU, 4GB Memory, Disk1 200GB, Disk2 900GB
  • vRack-LCM_Backup_Repository: 4 vCPU, 4GB Memory, Disk1 30GB

The following diagram is a logical representation of the VCF management stack.

vmware cloud foundation management

The platform creates a private, non-routable VLAN within the network to communication between the management virtual appliances. This network is used for all of the backend network communication.

These VMs are managed by the VCF platform and should not be modified in any manner. Modifying the VMs in any manner will have dire consequences on the platform and may prevent it from operating as intended.

The vRack Manager is the primary interface the user will deal with. The GUI HTTP server runs on this virtual appliance, including all of the workflows for creating, modifying and deleting workload domains within VCF. I have seen the VRM become unresponsive entirely or the GUI become painfully slow when interacting with it — a reboot of the virtual appliance has solved these issues when they have occurred.

In addition to the VCF management stack, the first rack of VCF hardware will also include vCenter Server, a Platform Services Controller, NSX Manager, a set of NSX Controllers, vRealize Operations virtual appliance and a vRealize Log Insight virtual appliance. It is important to take these virtual appliances into consideration when performing sizing calculations for the environment.

This management stack only exists on the first management cluster created on the first physical rack of VCF. The additional management stacks created in subsequent physical racks will leverage these same virtual appliances for their operations. The additional management clusters will only include a VRM, vCenter Server, NSX Manager and NSA controllers.

Update: VMware released a VMware Cloud Foundation Architecture Poster, which can be downloaded here in PDF form.

Note – this information is accurate as of November 2016 and may be subject to change based on future VMware Cloud Foundation platform changes.

MIT OpenCourseWare

mit opencoursweare

I am a member of the last generation that remembers what it was like before the Internet took over nearly all aspects of our lives. I remember cassette players, the Walkman, 8-bit video games and going outside from early morning to after sundown without an electronic device anywhere. My kids — I have six — are growing up in a completely different world from the one my wife and I grew up in. I learned about computers in the very early 1990s, when a i386 40MHz processor was screaming fast and a 4 Mb of memory was enormous! Everything I’ve learned about computers has been self-taught, from manually building a computer, to writing a program in Pascal, to building large-scale private cloud infrastructures. I have wanted to pass on that same passion for computers and technology to my children, but it has been a challenge to find a way to do so.

My 12-year son is a Boy Scout and has recently asked me to help him with the Programming Merit Badge. Finally! As I went through the requirements, I quickly realized it was going to much more challenging than I anticipated. But I am determined to expose him to all the things I love about computers and my career.

Fortunately I found MIT’s OpenCourseWare program. They have over 2300 courses online to help anyone learn a new subject or expand their knowledge in one they already know. Specifically they have an introduction course to programming, which covers Python and is geared to those with no programming experience. I will be going through the course with my son (and daughter if I can convince her) to help them learn how to program. And when I say “with”, I mean that literally. The first programming language I was exposed to was Basic, with the first course I ever took in high school used Pascal — who else remembers learning Pascal? Anyway, I became a huge fan of Perl in the late 1990s and never looked back — sure, I’ve dabbled in Python and Ruby, but I never spent the time to seriously learn them. So I will be learning Python right alongside my kids and hopefully fostering a love of computer science at the same time.

Continual education is something all of us should be doing. Over the years we have all probably taken a certification or training class — maybe even finished up our Bachelor’s degree or Graduate degree. Whatever you interests are, I encourage you to constantly be learning, be reading and sharing that passion and love with those around you — especially your children.

Their generation has never known what it was like to use a rotary phone, or to dial 411 to look up a phone number. The things we had to learn and challenge ourselves with they often take for granted. We are the generation that had to write the programming languages the apps they install on their cell phones and tablets use!

May we remember that passion and may we help them to have the same passion that drove so many of us in this great community. God bless.

Recommended Read – PunchingClouds

Punching Clouds site logo
Punching Clouds Site Banner

The PunchingCloud site is maintained by Rawlinson Rivera. His site focuses on the breakthrough technologies within VMware and specifically the Storage and Availability business unit. His recent posts on Virtual SAN are must-reads for anyone looking at moving towards a hyper-converged infrastructure within your environments. He is the author of several white papers which are great references as well, including one on using the Brocade VCS Fabric with Virtual SAN.

You can also watch several of his videos on YouTube discussing different virtualization topics.

I hope you find his blog as useful as I have over the last year.

Designing for a SLA Metric

twitter-post-slaOver the weekend I focused on two things — taking care of my six kids while my wife was out of town and documenting my VCDX design. During the course of working through the Monitoring portion of the design I found myself focusing on the technical reasons for some of the design decisions I was making to meet the SLA requirements of the design. That prompted the tweet you see the the left. When working on any design, you have to understand where the goal posts are in order to make intelligent decisions. With regards to an SLA, it means understanding what the SLA target is and on what frequency the SLA is being calculated. As you can see from the image, a SLA calculated against a daily metric will vary a considerable amount from a SLA calculated on a weekly or monthly basis.

So what can be done to meet the target SLA? If the monitoring solution is inside the environment, shouldn’t it have a higher target SLA than the thing it is monitoring? As I looked at the downtime numbers, I realized there were places where vSphere HA would not be adequate (by itself) to meet the SLA requirement of the design if it was being calculated on a daily or weekly basis. The ever elusive 99.99% SLA target eliminates vSphere HA altogether if it is being calculated on any less than a yearly basis.

As the architect of a project it is important to discuss the SLA requirements with the stakeholders and understand where the goal posts are. Otherwise you are designing in the vacuum of space with no GPS to guide you to the target.

SLAs within SLAs

The design I am currently working on had requirements for a central log repository and a SLA target of 99.9% for the tenant workload domain, calculated on a monthly basis. As I worked through the design decisions, I came to realize however the central logging capability that vRealize Log Insight is providing to the environment should be more resilient than the 99.9% uptime of the workload domain it is supporting. This type of SLA within a SLA is the sort of thing you may find yourself having to design against. So how could I increase the uptime to be able to support a higher target SLA for Log Insight?

The post on Friday discussed the clustering capabilities of Log Insight and that came about as I was working through this problem. If the clustering capability of Log Insight could be leveraged to increase the uptime of the solution, even on physical infrastructure only designed to provide a lower 99.9% SLA, then I could meet the higher target sub-SLA. By including a 3-node Log Insight cluster and creating anti-affinity rules on the vSphere cluster to ensure the Log Insight virtual appliances were never located on the same physical node, I was able to increase the SLA potential of the solution. The last piece of the puzzle was the incorporation of the internal load balancing mechanism of Log Insight and using the VIP as the target for all of the systems remote logging functionality. This allowed me to create a central logging repository with a higher target SLA than the underlying infrastructure SLA.

Designing for and justifying the decisions made to support a SLA is one of the more trying issues in any architecture, at least in my mind. Understanding how decisions made influence positively or negatively the SLA goals of the design is something every architect will need to do. This is one area where I was weak during my previous VCDX defense and as not able to accurately articulate. After spending significant time thinking through the key points of my current design, I have definitely learned more and have been able to understand what effects the choices I am making have.

The opinions expressed above are my own and as I have not yet acquired my VCDX certification, these opinions may not be shared by those who have.