Designing for a SLA Metric

twitter-post-slaOver the weekend I focused on two things — taking care of my six kids while my wife was out of town and documenting my VCDX design. During the course of working through the Monitoring portion of the design I found myself focusing on the technical reasons for some of the design decisions I was making to meet the SLA requirements of the design. That prompted the tweet you see the the left. When working on any design, you have to understand where the goal posts are in order to make intelligent decisions. With regards to an SLA, it means understanding what the SLA target is and on what frequency the SLA is being calculated. As you can see from the image, a SLA calculated against a daily metric will vary a considerable amount from a SLA calculated on a weekly or monthly basis.

So what can be done to meet the target SLA? If the monitoring solution is inside the environment, shouldn’t it have a higher target SLA than the thing it is monitoring? As I looked at the downtime numbers, I realized there were places where vSphere HA would not be adequate (by itself) to meet the SLA requirement of the design if it was being calculated on a daily or weekly basis. The ever elusive 99.99% SLA target eliminates vSphere HA altogether if it is being calculated on any less than a yearly basis.

As the architect of a project it is important to discuss the SLA requirements with the stakeholders and understand where the goal posts are. Otherwise you are designing in the vacuum of space with no GPS to guide you to the target.

SLAs within SLAs

The design I am currently working on had requirements for a central log repository and a SLA target of 99.9% for the tenant workload domain, calculated on a monthly basis. As I worked through the design decisions, I came to realize however the central logging capability that vRealize Log Insight is providing to the environment should be more resilient than the 99.9% uptime of the workload domain it is supporting. This type of SLA within a SLA is the sort of thing you may find yourself having to design against. So how could I increase the uptime to be able to support a higher target SLA for Log Insight?

The post on Friday discussed the clustering capabilities of Log Insight and that came about as I was working through this problem. If the clustering capability of Log Insight could be leveraged to increase the uptime of the solution, even on physical infrastructure only designed to provide a lower 99.9% SLA, then I could meet the higher target sub-SLA. By including a 3-node Log Insight cluster and creating anti-affinity rules on the vSphere cluster to ensure the Log Insight virtual appliances were never located on the same physical node, I was able to increase the SLA potential of the solution. The last piece of the puzzle was the incorporation of the internal load balancing mechanism of Log Insight and using the VIP as the target for all of the systems remote logging functionality. This allowed me to create a central logging repository with a higher target SLA than the underlying infrastructure SLA.

Designing for and justifying the decisions made to support a SLA is one of the more trying issues in any architecture, at least in my mind. Understanding how decisions made influence positively or negatively the SLA goals of the design is something every architect will need to do. This is one area where I was weak during my previous VCDX defense and as not able to accurately articulate. After spending significant time thinking through the key points of my current design, I have definitely learned more and have been able to understand what effects the choices I am making have.

The opinions expressed above are my own and as I have not yet acquired my VCDX certification, these opinions may not be shared by those who have.

 

Building a Log Insight Cluster

brac-header

Finding a post for today’s #vDM30in30 post was a challenge. When I set out to complete the challenge I knew the later posts would become more difficult as the weeks wore on, but I didn’t think the challenge would arise so quickly (i.e. the end of week 2). For whatever reason, I could not decide on a topic that I wanted to write about until late this evening. As I was working on the portion of my VCDX design that covers Monitoring and the supporting infrastructure, I found myself thinking about how to incorporate a proper vRealize Log Insight system into the design. That led to tonight’s topic, Log Insight clusters.

I have learned a VCDX design should never include a VMware product just for the sake of including it. The need for vRealize Log Insight in the current design I am working on is justified by the requirements. As I have learned to use Log Insight more extensively over the past year and a half, the strengths of the product continue to amaze me. One such strength is the ease with which it is possible to incorporate a high availability feature into the platform. If you are unfamiliar with vRealize Log Insight, it is an analytics and remote logging platform that acts as a remote syslog server capable of parsing hundreds of thousands of log messages per day. The regular expression capabilities of the product are second-to-none — much better and more reliable than similar products like Splunk (IMHO).

The design I am working on is leveraging VMware Cloud Foundation (VCF) as the hardware and SDDC platform. With this requirement comes certain constraints, including the deployment method VCF uses for vRealize Log Insight. When VCF creates the management domain, it deploys a single vRealize Log Insight virtual appliance. Because I have a requirement to store all relevant log files in a central location, leveraging the existing vRealize Log Insight virtual appliance makes sense. However a single node is a single point of failure, which is not adequate for a production architecture, let alone a VCDX design.

So how can vRealize Log Insight be enhanced to handle a failure? Why a cluster of course! The Engineering team responsible for vRealize Log Insight were kind enough to build a clustering feature into the product and even included an internal load balancer as well! Having a cluster of nodes allows the environment to handle an eventual failure event — whether it is because the VM operating system becomes unresponsive or the underlying ESXi node fails altogether. Once configured, the VIP specified for use by the internal load balancer should be the IP and/or FQDN all of the downstream services use for sending syslog messages.

Configure a Log Insight Cluster

The creation of a Log Insight cluster is relatively straightforward and I will quickly go through the steps. Remember the Log Insight nodes have a requirement to exist on the same L2 network — no L3 support for multiple geographic clusters currently. Simply deploy three Log Insight virtual appliances and power them on. Once the OS has been started, log into the web UI for the additional instances and perform the following steps.

log_insight_01
Select Next to proceed with the configuration on the new node.
log_insight_02
Select the Join Existing Deployment option.
log_insight_03
Enter the FQDN of the existing master Log Insight node and click Go.
log_insight_04
Once joined, select the hyperlink to take you to the master node UI.
log_insight_05
Log in using the Administrator credentials for Log Insight.
log_insight_06
Select Allow to join the new node to the cluster.
log_insight_07
Configure a Virtual IP address for the Integrated Load Balancer.

Add a third node in and you have a working vRealize Log Insight cluster, capable of distributing incoming log messages between multiple nodes. Depending on the SLA for the environment, you can increase the number of nodes within the cluster to meet the requirements.

Fortunately for me, the weekend posts were written on election night and are scheduled to auto-publish. Hopefully that will allow me to spend some much needed time working on VCDX design documentation. The December 1 deadline is fast approaching!

Recommended Read – Simon Long’s SLOG

slog
Simon Long – http://www.simonlong.co.uk/blog/

As I put thought into how and what I wanted to discuss during the vDM 30-in-30 challenge, I’ve decided the weekend posts will be a break from the technical posts I have always favored. For the Saturday installments I am going to recommend other blogger sites or physical books that have enriched my career or work experience. I’ve decided Simon Long will be the first Saturday topic.

I met Simon for the first time last summer after having joined VMware in June. We shared an office for the first year and I’ve come to appreciate his knowledge and attitude for sharing information and advice. Simon was one of the early double VCDX (he may have been the first) and is currently a panelist for many VCDX panels. He has been blogging far longer than I have and many of his posts focus on VCDX related topics. If you are a thinking about pursuing the VCDX certification, his blog is one you should be reading — especially his recent posts on common VCDX mistakes.

In addition to his VCDX knowledge, Simon is heavily focused on EUC and Desktop-as-a-Service architectures. I encourage you to follow him on Twitter and reach out when you have questions.

vExpert 2016 and VCDX Preparation

VMW-LOGO-vEXPERT-2016-k

January is already over and I did not have a single post for the entire month. Yesterday, the vExpert 2016 awards were announced and I was happy to see my name on the list for a second year in a row. There are an amazing group of people in the community contributing to such a wide variety of topics and I am grateful to be considered a part of that. I really want to step up my game this year and cover in even more detail the Hadoop/Big Data and Cloud Native Apps topics happening within the VMware ecosystem and beyond!

The preparation for my VCDX defense is winding down — with a little over 10 days before I defend, there really isn’t much more I could try to learn beforehand. I feel pretty confident in how well I know the design itself and I’ve gone to considerable lengths the past few weeks to highlight areas where it is lacking and/or what I would do differently had some of the constraints not been in place. I am blessed to work with some amazing people and they have given me some great advice over the past few months on what to do and what not to do as I have journeyed down this path. The VCDX community is really strong and there are a lot of differing opinions on what a candidate should do to prepare — part of the experience for me has been which voices to ignore and which to place value in.

The part that has been the most stressful have been the slides themselves. I talked to several people, and I am grateful to each for taking the time, and the best advice I received was the following:

  1. Keep the deck short and to the point.
  2. Use it as a warm-up to get comfortable in the room.
  3. The defense is how you communicate and not on how many slides you have or how pretty they look.

All that said, I believe I took a unique approach to how I prepared my slides which play to my strengths. I am already comfortable talking in front of crowds both large and small, and my current position at VMware affords me the opportunity to defend design decisions to a really strong group of architects, including three double-VCDXs. I anticipate the experience as one to afford me an opportunity significant growth personally and professionally.

February 16th @8:30AM really could not come soon enough for me!

2015 Year-End Review for Virtual Elephant

2015

2015 Goals & Accomplishments

The year was one of the very best in recent memory across many aspects of my life — it also saw some of the biggest changes! When 2015 began I set out to accomplish several goals both personally and professionally, nearly all of which were realized in a much shorter time period than expected. The biggest goal, and change, that came about this year for my professional career was to become a full-time cloud architect. Initially, I did not think that goal would be realized until late Q3 or early Q4, however much to my surprise an unreal opportunity came in Q2 and I suddenly found myself (and family) moving to Northern California to join VMware as an architect on their private cloud team. It happened so fast that much of the remaining seven months of the year have been a whirlwind and left me with a feeling of whiplash when I look back!

The other two professional goals I set for myself included growing the Virtual Elephant audience and gaining both VCAP-DCV certifications. I have been very pleased with the success I have achieved in both areas — neither of which could have been achieved without a great support system of friends and family. The Virtual Elephant blog began in Q3 of 2014 with very modest goals of being a place I could talk about Hadoop and generic virtualization topics without re-writing much of the great work that is already on the internet. As 2015 began, I felt as though I had found my niche in the blogosphere, focusing on Hadoop and Cloud Native Applications. The year has been a great one for both sets of technologies, and I have immensely enjoyed writing about these topics. I get excited whenever a new feature preview comes about or an idea pops into my head for something else to try. Not every idea has made it onto the blog, but those that have I am extremely pleased with.

The most important accomplishment in 2015 was celebrating my 15th wedding anniversary with my wife! It was not a typical “goal” you set at the start of the year, but a major accomplishment nonetheless. The last 15 years of marriage have been amazing and we have a beautiful family where we have been blessed with six truly amazing children. Most of the credit goes to her for these accomplishments. Without her support, faith, care and love I would not be the man I am today nor have seen the success the year brought!

Blog Statistics

At the time of this writing, the blog had seen roughly 20,000 page views and 6,500 unique sessions during the course of 2015.

2015-analytics-1

The audience was all over the world, with the following graphic coming from the sites annual WordPress report.

2015-readership-map

The site saw ~50 posts published throughout the course of the year and the top posts on the site were:

  1. Docker Container for IO Benchmarking
  2. Multiple Storage I/O Profiles in Apache Mesos & Mesosphere Marathon
  3. Docker Minecraft Containers to the Rescue!

Not having statistics from 2014 makes judging these numbers impossible, but I am confident the traffic rate from the site more than doubled this year. The goal for 2016 is to see those numbers double again, but the only way that happens is if I continue to generate good, interesting and useful content for the community!

I truly believe the most important thing to do with a blog, especially if you are looking to differentiate your voice from the masses, is to generate new, creative and unique content. I hope to continue to do so in the future.

What’s Coming in 2016?

I do not yet know what 2016 holds for me, but I am excited by the things I am working on and look forward to seeing them through to completion. The biggest professional goal I have for 2016 is to gain my VCDX certification. The past year has helped me grow significantly and I have been truly blessed to be surrounded by great individuals at work each day who challenge me and help me to become a better architect. Hopefully the February VCDX defenses will be my opportunity to achieve this next milestone in my career.

I also hope to continue writing, both for the blog and on the book I have been working on throughout this year. The idea of being a published author is something I had never fathomed before a couple years ago and what a learning experience it has been! I look forward to sharing more details on the book project with you in the future.

Finally, I hope to return to the VMworld stage this year to talk and share my experiences with you running very large-scale infrastructure, best practices and interesting information surrounding the Cloud Native Apps space.

I hope you all have a wonderful New Year and look forward to talking to each of you in 2016!