Understanding a Design Decision


The last couple of months leading into the end of the year has seen me focusing once again on earning the VCDX certification. In the process of doing a fair amount of examination of my skills, especially my areas of weakness, I knew a new design was needed. Fortunately a new project at work had me focusing on building an entirely new VMware Integrated OpenStack service offering. Being able to work on the design from inception to POC to Pilot has provided me a great learning opportunity. One of my weaknesses has been to be sure I understand the ramifications of each design decision being made in the architecture. As I worked through the process of documenting all of the design decisions, I settled on a template within the document.

The following table is replicated for each design decision within the architecture.


One of the ways I worked to improve my understanding of how to document a proper design was the book, IT Architect: Foundation in the Art of Infrastructure Design. In the book I noticed the authors made sure to highlight the design justifications throughout every chapter. I wanted to incorporate that same justifications within my VCDX architecture document and be sure to document the other risks, impacts and also the requirements that were achieved by the decision.

In the design I am currently working on, an example of the above table in action can be found in the following image.


Here a decision for the compute platform was made to use the Dell PowerEdge R630 server. Requirements like the SLA had to also be taken into consideration, which you see in the risks and risk mitigation. The table helps to highlight when some design decisions actually add in additional requirements for the architecture — usually found in the Impact or Decision Risks section of the table. In the case of the example, the table notes,

Dell hardware has been prone to failures, includes drives, SD cards and controller failures.

I documented the risk based on knowledge acquired over nearly a decade of using Dell hardware, especially most recently in my current role. Based on that knowledge, I documented it as a risk which would need to be addressed — which created an ancilliary requirement needing to be addressed. The subsequent Risk Mitigation fulfills the new requirement.

A 4-hour support contract is purchased for each compute node. In addition, an on-site hardware locker is maintained at the local data center, which contains common components to reduce the mean-time-to-resolution when a failure occurs.

The subsequent decision to purchase a 4-hour support contract from Dell for issues, combined with the on-site hardware locker, allow the design to account for the SLA requirements of the service offering while also solving a known risk — hardware failure. In my previous VCDX attempt, I did not do a good enough job working through this thought process and is a key reason why I was not successful.

The process of documenting the table has helped me make sure the proper amount time is spent thinking through every decision. I am also finding documenting all the decisions to be helpful as I review the design with others. All-in-all it has been a great process to work through and is helping me to be sure to know and comprehend every aspect of the design.

As noted previously, I am still pursuing my VCDX certification right now and so these opinions may not be shared by those who have already earned their VCDX certifications.