Category: VCDX

This is the first post in what I plan to be a sporadic, yet on-going series highlighting certain aspects of a VCDX / Architect skillset. These VCDX Quick Hits will cover a range of topics and key in on certain aspects of the VCDX blueprint. It is my hope they will trigger some level of critical thinking on the readers part and help them improve their skillset.

The idea for this post came after listening to a post-mortem call for a recent incident that occurred at work. The incident itself was a lower priority Severity 2 incident, meaning it only impacted a small subset of customers in a small failure domain (a single vCenter Server). As architects, we know monitoring is a key component of any architecture design — whether it is intended for a VCDX submission or not.

In IT Architect: Foundation in the Art of Infrastructure Design (Amazon link), the authors state:

“A good monitoring solution will identify key metrics of both the physical and virtual infrastructure across all key resources compute, storage, and networking.”

The post-mortem call got me thinking about maturity within our monitoring solutions and improving our architecture designs by striving to understand the components better earlier in the design and pilot phases.

It is common practice to identify the key components and services of an architecture we designed, or are responsible for, to outline which are key to support the service offering. When I wrote the VMware Integrated OpenStack design documentation, which later became the basis for my VCDX defense, I identified specific OpenStack services which needed to be monitored. The following screen capture shows how I captured the services within the documentation.

As you can see from the above graphic, I identified each service definition with a unique ID, documented the component/service, documented where the service should be running, and a brief description of the component/service. The information was used to create the Sprint story for the monitoring team to create the alert definitions within the monitoring solution.

All good right?

The short answer is, not really. What I provided in my design was adequate for an early service offering, but left room for further maturity. Going back to the post-mortem call, this is where additional maturity in the architecture design would have helped reduce the MTTR of the incident.

During the incident, two processes running on a single appliance were being monitored to determine if they were running. Just like my VMware Integrated OpenStack design, these services had been identified and were being monitored per the architecture specification. However, what was not documented was the dependency between the two processes. In this case, process B was dependent on process A and although process A was running, it was not properly responding to the queries from process B. As a result, the monitoring system believed everything was running correctly — it was from an alert definition perspective — and the incident was not discovered immediately. Once process A was restarted, it began responding to the queries from process B and service was restored.

So what could have been done?

First, the architecture design could have written an alert definition for the key services (or processes) that went beyond just measuring whether the service is running.

Second, the architecture design could have better understood the inter-dependencies between these two processes and written an more detailed alert definition. In this case, there was a log entry written each time process A did not correctly respond to process B. Having an alert definition for this entry in the logs would have allowed the monitoring system to generate an alert.

Third, the architecture design could have used canary testing as a way to provide a mature monitoring solution. It may be necessary to clarify what I mean when I use the term canary testing.

“Well into the 20th century, coal miners brought canaries into coal mines as an early-warning signal for toxic gases, primary carbon monoxide. The birds, being more sensitive, would become sick before the miners, who would then have a chance to escape or put on protective respirators.” (Wikipedia link)

Canary testing would them imply a method of checking the service for issues prior to a customer discovering them. Canary testing should include common platform operations a customer would typically do — this can also be thought of as end-to-end testing.

For example, a VMware Integrated OpenStack service offering with NSX would need to ensure that both the NSX Manager is online, but also that the OpenStack Neutron service is able to communicate to it. A good test could be to make an OpenStack Neutron API call to deploy a NSX Edge Service Gateway, or create a new tenant network (NSX logical switch).

There are likely numerous ways a customer will interact with your service offering and defining these additional tests within the architecture design itself are something I challenge you consider.

Read More

In preparing for my recent VCDX Defense, I read a great deal of articles and a few books to better understand how to properly document and justify the design decisions I was making. One book in particular provided valuable insight that has helped me not just with the VCDX certification, but also in becoming a better Infrastructure Architect.

In IT Architect: Foundation in the Art of Infrastructure Design (Amazon link), the authors state:

“Design Decisions will support the project requirements directly or indirectly…When a specific technology is required to meet a design goal, justification is important and should be provided. With each design decision there is a direct, intended impact, but there are also other areas that may be affected…These options and their respective value can add quality to the design you make and provide insight into why you took a specific path.”

As I thought through the impact of each design decision, I tried to identify several key points, including:

  • Justification
  • Impact
  • Decision Risks
  • Risk Mitigation
  • Requirements Achieved

After I had identified each of those key points, and in some cases multiple points, for each category I made sure they were properly documented. The book provided an example table to draw inspiration from, in addition Derek Seaman did as well on a blog article. I modified the examples to fit my writing style and then included a specific table for each design decision made at the end of each major section or heading within my architecture documentation.

An example of the table and categories showing the reasoning behind a set of design decisions from my VMware Integrated OpenStack VCDX Architecture document:

vcdx_design_decision_sample

Now, when I need to revisit a design decision or another architect is reviewing the decisions within the design, there is additional information to provide insight into the thought process. It also helps to highlight what impact the decision has on the architecture as a whole.

Beyond the table and the relevant information for the design decision, it may be necessary to highlight the alternatives that were considered. As we know, there are usually multiple ways to meet a requirement — “showing your work” and being able to explain why you chose to do X versus Y in the VCDX Defense is an important aspect of the process. I found doing so within my documentation useful and you may find that to be true also.

Enjoy!

The opinions expressed in this article are entirely my own and based solely on my own VCDX certification experience. They may or may not reflect the opinions of other VCDX certification holders or the VMware VCDX program itself.


Arrasjid, John Y., Mark Gabryjelski, and Chris McCain. “Chapter 2, Design Decisions.” IT Architect: Foundation in the Art of Infrastructure Design; a Practical Guide for IT Architects. Upper Saddle River, NJ: IT Architect Resource, 2016. 49. Print.

Read More

During the process of writing the documentation necessary for the VCDX certification, I read several books and a fair number of blog articles. One article in particular that I found helpful was from Derek Seaman’s blog.

Sample VCDX-DCV Architecture Outline

In the spirit of paying it forward, I am going to share my own table of contents for others to use as a starting point. No two will be the same and some of the things I included may not be necessary in your own design — you may even feel there are sections that are missing from my own. If nothing else, I hope it can be a starting point for you in the journey towards earning the VCDX certification.

Enjoy!

The opinions expressed in this article are entirely my own and based solely on my own VCDX certification experience. They may or may not reflect the opinions of other VCDX certification holders or the VMware VCDX program itself.

Read More

“Nothing in the world is worth having or worth doing unless it means effort, pain, difficulty… I have never in my life envied a human being who led an easy life. I have envied a great many people who led difficult lives and led them well.” -Theodore Roosevelt

That quote from Theodore Roosevelt sums up rather well the VCDX certification. The VCDX certification takes a great deal of effort, pain and difficulty to accomplish. My personal journey included multiple defense attempts — much to my dismay and benefit. Fortunately, it was all worth it!

I am VCDX #257!

The VCDX certification requires a significant amount of time to earn. If I had to estimate it, I would say I spent between 200+ hours working on my design documentation, defense presentation, mock defenses, Q&A sessions and just general research. The submitted design was also an actual work project, so some of that time investment was for my job — an added benefit not all candidates have.

The one lesson I would share with others thinking about or pursuing their own VCDX certification is the following — be careful who you ask advice of or take advice from. If they have not been a panelist in the past, their view into what to do (or not to do) is going to be mostly opinion. The VCDX program held a Q&A call the Friday before the defenses began in May.

On the call were Joe Silvagi, Simon Long and Karl Childs — all three are heavily involved in the program. The most frequent questions asked by the candidates started with the phrase, “My mentor says” or “The community says”. In nearly every instance the response from Joe was along of the lines of that isn’t right.

Attend one (or more) VCDX workshops prior to submitting so that you can ask questions and reach out to the people running the workshops to get trustworthy responses.

That’s all the advice I have to give.

There is an African Proverb, and the quote is outside one of the VMware conference rooms, that says:

“If you want to go fast, go alone. If you want to go far, go together.”

This is true of the VCDX certification. I got to this point not because I went alone, but because I went with others.

My wife – No one on this earth has supported me more. The countless hours over the past two decades of late nights as I strived to advance my career. This is as much her certification as it is my own.

Rich Steck (Adobe) – He mentored me during one of the most difficult years in my career. He challenged me to figure out where I wanted to go and to find paths to get there. Most importantly, he listened.

Frans van Rooyen (Adobe) – Already a brilliant cloud architect in his own right, he mentored me in my role as a Compute Platform Engineer for two years. He let me constantly challenge all of the decisions we were making (on-the-fly) as we built a rather large private cloud across the globe. He introduced me to VMware technologies and helped me gain the skills I would need to land my dream job at VMware in two short years.

Andrew Nelson (VMware) – While at Adobe, Frans introduced me to Andrew. Andy and I spoke at VMworld together in San Francisco and Barcelona in 2014. We briefly worked on a book together, during which time he told me if I wanted to get a job at VMware, I’d be surprised how quickly it would happen. I had an offer for my current role barely 1 month later.

OneCloud Architecture Team (VMware) – My dream job came with the opportunity to work with 3 double-VCDX certification holders. The first architecture review board call I attended they tore into another architect over his vRA design and it was at that moment I knew I was going to have to step up my game significantly to play with them. What a blessing it has been to work with them for the past two years — each of them has helped me grow my skills as an architect immensely. They taught me to critically challenge a design decision, not just for the sake of arguing, but because we are trying to understand the rationale for the decision.

Their support continued from afar as I went through the process of submitting and defending my design for my own certification. When I got the email saying I was now VCDX #257, they were right there celebrating my success with me.

Thank you to each of you for helping me realize my dreams and earn the VCDX certification!

 

Read More

I am currently pursuing my VCDX certification and the design I have submitted is based on VMware Cloud Foundation and VMware Integrated OpenStack. As part of the required documentation, I included a deployment guide — unfortunately, it is not as simple as laying down the SDDC components and the VIO vApp for the deployment.

This blog post will cover a couple items that are needed to get the two pieces playing together.


Shared Edge & Workload Cluster

The VCF architecture currently has a limitation that a vCenter Server can only have a single vSphere cluster — it’s a 1:1 relationship. VMware Integrated OpenStack requires either 3 clusters in a single vCenter Server or a management cluster in one vCenter Server instance and two clusters in a second vCenter Server. Neither of these options are compatible with VMware Integrated OpenStack.

In order to make it work, we are going to use a two vCenter Server deployment of VMware Integrated OpenStack and modify the OMS server to combine the NSX Edge and Workload Clusters into one. We do this by editing a single configuration file and restarting the oms service running on the VIO vApp Management (OMS) VM.

$ cd /opt/vmware/vio/etc
$ sudo vim moms.properties

Add the following line to the end of the file:
oms.allow_shared_edge_cluster = true

$ sudo restart oms

VMware Integrated OpenStack can now be deployed on top of VMware Cloud Foundation.


VXLAN-backed External Network

This one is a bit trickier and is an obstacle whether or not you are using VMware Cloud Foundation as the infrastructure layer.

Logically, the end result for the OpenStack external network is to attach to a VXLAN port group created by NSX. The NSX logical switch network is attached to the internal interface on a NSX Distributed Logical Router.

The following is the logical diagram for the architecture.

external openstack

The issue is that during the deployment of an OpenStack instance using VMware Integrated OpenStack, you have to specify an external network. However, VMware Integrated OpenStack will not allow a vSphere Administrator to select a VXLAN port group during the deployment. I got around this by creating a non-VXLAN port group on the DVS used only for the deployment.

Once the OpenStack deployment is complete, I needed to attach the actual VXLAN-backed port group as the external network.

SSH to the OMS server
$ ssh -l viouser oms.domain.local

SSH to an OpenStack controller VM
$ ssh controller01
$ sudo cp /root/cloudadmin_v3.rc .
$ source cloudadmin_v3.rc
$ neutron

(neutron) net-list
(neutron) net-create --provider:network_type=portgroup --provider:physical_network=virtualwire-XX vio-external-network
(neutron) net-list

The network will now appear in the OpenStack network list. Go ahead and create your subnet for the external IP addresses, based on the network assignment in your environment.

If you have questions or issues with implementing these changes in your environment, please reach out.

Read More