The VMware Cloud Foundation platform includes a management stack, which includes several virtual appliances for handling the automation and SDDC lifecycle management of the platform. The virtual appliances will be deployed inside the management cluster before the VCF rack is shipped to the customer.
The VCF management stack include:
- vRack Manager (vRM): 4 vCPU, 12GB Memory, Disk1 100GB, Disk2 50GB
- vRack-ISVM[1-3]: 4 vCPU, 4 GB Memory, Disk1 20GB
- vRack-LCM_Repository: 4 vCPU, 4GB Memory, Disk1 200GB, Disk2 900GB
- vRack-LCM_Backup_Repository: 4 vCPU, 4GB Memory, Disk1 30GB
The following diagram is a logical representation of the VCF management stack.
The platform creates a private, non-routable VLAN within the network to communication between the management virtual appliances. This network is used for all of the backend network communication.
These VMs are managed by the VCF platform and should not be modified in any manner. Modifying the VMs in any manner will have dire consequences on the platform and may prevent it from operating as intended.
The vRack Manager is the primary interface the user will deal with. The GUI HTTP server runs on this virtual appliance, including all of the workflows for creating, modifying and deleting workload domains within VCF. I have seen the VRM become unresponsive entirely or the GUI become painfully slow when interacting with it — a reboot of the virtual appliance has solved these issues when they have occurred.
In addition to the VCF management stack, the first rack of VCF hardware will also include vCenter Server, a Platform Services Controller, NSX Manager, a set of NSX Controllers, vRealize Operations virtual appliance and a vRealize Log Insight virtual appliance. It is important to take these virtual appliances into consideration when performing sizing calculations for the environment.
This management stack only exists on the first management cluster created on the first physical rack of VCF. The additional management stacks created in subsequent physical racks will leverage these same virtual appliances for their operations. The additional management clusters will only include a VRM, vCenter Server, NSX Manager and NSA controllers.
Update: VMware released a VMware Cloud Foundation Architecture Poster, which can be downloaded here in PDF form.
Note – this information is accurate as of November 2016 and may be subject to change based on future VMware Cloud Foundation platform changes.
I am a member of the last generation that remembers what it was like before the Internet took over nearly all aspects of our lives. I remember cassette players, the Walkman, 8-bit video games and going outside from early morning to after sundown without an electronic device anywhere. My kids — I have six — are growing up in a completely different world from the one my wife and I grew up in. I learned about computers in the very early 1990s, when a i386 40MHz processor was screaming fast and a 4 Mb of memory was enormous! Everything I’ve learned about computers has been self-taught, from manually building a computer, to writing a program in Pascal, to building large-scale private cloud infrastructures. I have wanted to pass on that same passion for computers and technology to my children, but it has been a challenge to find a way to do so.
My 12-year son is a Boy Scout and has recently asked me to help him with the Programming Merit Badge. Finally! As I went through the requirements, I quickly realized it was going to much more challenging than I anticipated. But I am determined to expose him to all the things I love about computers and my career.
Fortunately I found MIT’s OpenCourseWare program. They have over 2300 courses online to help anyone learn a new subject or expand their knowledge in one they already know. Specifically they have an introduction course to programming, which covers Python and is geared to those with no programming experience. I will be going through the course with my son (and daughter if I can convince her) to help them learn how to program. And when I say “with”, I mean that literally. The first programming language I was exposed to was Basic, with the first course I ever took in high school used Pascal — who else remembers learning Pascal? Anyway, I became a huge fan of Perl in the late 1990s and never looked back — sure, I’ve dabbled in Python and Ruby, but I never spent the time to seriously learn them. So I will be learning Python right alongside my kids and hopefully fostering a love of computer science at the same time.
Continual education is something all of us should be doing. Over the years we have all probably taken a certification or training class — maybe even finished up our Bachelor’s degree or Graduate degree. Whatever you interests are, I encourage you to constantly be learning, be reading and sharing that passion and love with those around you — especially your children.
Their generation has never known what it was like to use a rotary phone, or to dial 411 to look up a phone number. The things we had to learn and challenge ourselves with they often take for granted. We are the generation that had to write the programming languages the apps they install on their cell phones and tablets use!
May we remember that passion and may we help them to have the same passion that drove so many of us in this great community. God bless.
The PunchingCloud site is maintained by Rawlinson Rivera. His site focuses on the breakthrough technologies within VMware and specifically the Storage and Availability business unit. His recent posts on Virtual SAN are must-reads for anyone looking at moving towards a hyper-converged infrastructure within your environments. He is the author of several white papers which are great references as well, including one on using the Brocade VCS Fabric with Virtual SAN.
You can also watch several of his videos on YouTube discussing different virtualization topics.
- Ask the Experts: Deliver Agility and Control to your Data Center with VMware and SolidFire
- Rawlinson Rivera & Yanbing Li, VMware – #VMworld – #theCUBE
- VCDX Defense Preparation: Design Scenario Examples
- Ask the Experts: SPBM and VVols Storage with SolidFire
I hope you find his blog as useful as I have over the last year.
Over the weekend I focused on two things — taking care of my six kids while my wife was out of town and documenting my VCDX design. During the course of working through the Monitoring portion of the design I found myself focusing on the technical reasons for some of the design decisions I was making to meet the SLA requirements of the design. That prompted the tweet you see the the left. When working on any design, you have to understand where the goal posts are in order to make intelligent decisions. With regards to an SLA, it means understanding what the SLA target is and on what frequency the SLA is being calculated. As you can see from the image, a SLA calculated against a daily metric will vary a considerable amount from a SLA calculated on a weekly or monthly basis.
So what can be done to meet the target SLA? If the monitoring solution is inside the environment, shouldn’t it have a higher target SLA than the thing it is monitoring? As I looked at the downtime numbers, I realized there were places where vSphere HA would not be adequate (by itself) to meet the SLA requirement of the design if it was being calculated on a daily or weekly basis. The ever elusive 99.99% SLA target eliminates vSphere HA altogether if it is being calculated on any less than a yearly basis.
As the architect of a project it is important to discuss the SLA requirements with the stakeholders and understand where the goal posts are. Otherwise you are designing in the vacuum of space with no GPS to guide you to the target.
SLAs within SLAs
The design I am currently working on had requirements for a central log repository and a SLA target of 99.9% for the tenant workload domain, calculated on a monthly basis. As I worked through the design decisions, I came to realize however the central logging capability that vRealize Log Insight is providing to the environment should be more resilient than the 99.9% uptime of the workload domain it is supporting. This type of SLA within a SLA is the sort of thing you may find yourself having to design against. So how could I increase the uptime to be able to support a higher target SLA for Log Insight?
The post on Friday discussed the clustering capabilities of Log Insight and that came about as I was working through this problem. If the clustering capability of Log Insight could be leveraged to increase the uptime of the solution, even on physical infrastructure only designed to provide a lower 99.9% SLA, then I could meet the higher target sub-SLA. By including a 3-node Log Insight cluster and creating anti-affinity rules on the vSphere cluster to ensure the Log Insight virtual appliances were never located on the same physical node, I was able to increase the SLA potential of the solution. The last piece of the puzzle was the incorporation of the internal load balancing mechanism of Log Insight and using the VIP as the target for all of the systems remote logging functionality. This allowed me to create a central logging repository with a higher target SLA than the underlying infrastructure SLA.
Designing for and justifying the decisions made to support a SLA is one of the more trying issues in any architecture, at least in my mind. Understanding how decisions made influence positively or negatively the SLA goals of the design is something every architect will need to do. This is one area where I was weak during my previous VCDX defense and as not able to accurately articulate. After spending significant time thinking through the key points of my current design, I have definitely learned more and have been able to understand what effects the choices I am making have.
The opinions expressed above are my own and as I have not yet acquired my VCDX certification, these opinions may not be shared by those who have.
Finding a post for today’s #vDM30in30 post was a challenge. When I set out to complete the challenge I knew the later posts would become more difficult as the weeks wore on, but I didn’t think the challenge would arise so quickly (i.e. the end of week 2). For whatever reason, I could not decide on a topic that I wanted to write about until late this evening. As I was working on the portion of my VCDX design that covers Monitoring and the supporting infrastructure, I found myself thinking about how to incorporate a proper vRealize Log Insight system into the design. That led to tonight’s topic, Log Insight clusters.
I have learned a VCDX design should never include a VMware product just for the sake of including it. The need for vRealize Log Insight in the current design I am working on is justified by the requirements. As I have learned to use Log Insight more extensively over the past year and a half, the strengths of the product continue to amaze me. One such strength is the ease with which it is possible to incorporate a high availability feature into the platform. If you are unfamiliar with vRealize Log Insight, it is an analytics and remote logging platform that acts as a remote syslog server capable of parsing hundreds of thousands of log messages per day. The regular expression capabilities of the product are second-to-none — much better and more reliable than similar products like Splunk (IMHO).
The design I am working on is leveraging VMware Cloud Foundation (VCF) as the hardware and SDDC platform. With this requirement comes certain constraints, including the deployment method VCF uses for vRealize Log Insight. When VCF creates the management domain, it deploys a single vRealize Log Insight virtual appliance. Because I have a requirement to store all relevant log files in a central location, leveraging the existing vRealize Log Insight virtual appliance makes sense. However a single node is a single point of failure, which is not adequate for a production architecture, let alone a VCDX design.
So how can vRealize Log Insight be enhanced to handle a failure? Why a cluster of course! The Engineering team responsible for vRealize Log Insight were kind enough to build a clustering feature into the product and even included an internal load balancer as well! Having a cluster of nodes allows the environment to handle an eventual failure event — whether it is because the VM operating system becomes unresponsive or the underlying ESXi node fails altogether. Once configured, the VIP specified for use by the internal load balancer should be the IP and/or FQDN all of the downstream services use for sending syslog messages.
Configure a Log Insight Cluster
The creation of a Log Insight cluster is relatively straightforward and I will quickly go through the steps. Remember the Log Insight nodes have a requirement to exist on the same L2 network — no L3 support for multiple geographic clusters currently. Simply deploy three Log Insight virtual appliances and power them on. Once the OS has been started, log into the web UI for the additional instances and perform the following steps.
Add a third node in and you have a working vRealize Log Insight cluster, capable of distributing incoming log messages between multiple nodes. Depending on the SLA for the environment, you can increase the number of nodes within the cluster to meet the requirements.
Fortunately for me, the weekend posts were written on election night and are scheduled to auto-publish. Hopefully that will allow me to spend some much needed time working on VCDX design documentation. The December 1 deadline is fast approaching!