Docker ‘ubuntu-ansible’ update

I have been working with Ansible and all of the vSphere modules an enormous amount recently. As part of that work, I’ve extended the functionality of the Docker container I use for all of my development work. The container can be downloaded from Docker Hub and consumed by anyone — there is no proprietary information within the container.

The updated version includes two vSAN Python modules required for an updated vSAN Ansible module I am working on. In addition, the container now pulls the upstream NSX-v Ansible module from VMware, instead of my cloned repo on GitHub.com/virtualelephant. The reason being, all of the code I’ve written for NSX-v is now in the upstream module.

The full docker file can be obtained on GitHub.

  1 # Dockerfile for creating an Ansible Control Server with
  2 # the VMware modules necessary to build a complete Kubernetes
  3 # stack.
  4 # Blog details available: http://virtualelphant.com
  5 
  6 FROM ubuntu:artful
  7 MAINTAINER Chris Mutchler <chris@virtualelephant.com>
  8 
  9 RUN \
 10   apt-get -y update && \
 11   apt-get -y dist-upgrade && \
 12   apt-get -y install software-properties-common python-software-properties vim && \
 13   apt-add-repository ppa:ansible/ansible
 14 
 15 # Install packages needed for NSX modules in Ansible
 16 RUN \
 17   apt-get -y update && \
 18   apt-get -y install ansible python-pip python-dev libxml2 libxml2-dev libxslt1-dev zlib1g-dev npm git && \
 19   pip install --upgrade pyvmomi && \
 20   pip install pysphere && \
 21   pip install nsxramlclient && \
 22   npm install -g https://github.com/yfauser/raml2html && \
 23   npm  install -g raml-fleece
 24 
 25 # Get NSXRAML
 26 
 27 # Add additional Ansible modules for NSX and VM folders
 28 RUN \
 29   git clone -b 6.4 https://github.com/vmware/nsxraml.git /opt/nsxraml && \
 30   git clone https://github.com/vmware/nsxansible && \
 31   git clone https://github.com/vmware/ansible-modules-extras-gpl3.git && \
 32   rm -rf nsxansible/library/__init__.py && \
 33   cp nsxansible/library/*.py /usr/lib/python2.7/dist-packages/ansible/modules/cloud/vmware/ && \
 34   git clone https://github.com/openshift/openshift-ansible-contrib && \
 35   /bin/cp openshift-ansible-contrib/reference-architecture/vmware-ansible/playbooks/library/vmware*.py /usr/lib/python2.7/dist-packages/ansible/modules/cloud/vmw    are/
 36 
 37 # Add vSAN Python API modules - must be done after pyVmomi installation
 38 COPY vsanmgmtObjects.py /usr/lib/python2.7/
 39 COPY vsanapiutils.py /usr/lib/python2.7/
 40 
 41 # Setup container to properly use SSH bastion host for Ansible
 42 RUN mkdir /root/.ssh
 43 RUN chmod 740 /root/.ssh
 44 COPY config /root/.ssh/config
 45 COPY ansible.cfg /etc/ansible/
 46 
 47 # Edit MOTD to give container consumer info
 48 COPY motd /etc/motd
 49 RUN echo '[ ! -z "$TERM" -a -r /etc/motd ] && cat /etc/issue && cat /etc/motd' >> /etc/bash.bashrc

I am still mounting a local volume that contains the Ansible playbooks within it. For reference, I run the container with the following command:

$ docker run -it --rm --name ansible-sddc -v /PATH/TO/ANSIBLE:/opt/ansible virtualelephant/ubuntu-ansible

If you run into an issues with the Docker container, please let me know on Twitter. Enjoy!

NSX Ansible Updates

It has been a hectic few months for me. I relocated my family to Colorado last month, and as a result all of my side-projects were put on the back-burner. During the time I was away, I was selected for the fourth year in a row as a vExpert! I am grateful to be a part of this awesome community! I strive to make the work I do and submit here worthwhile and informative for others.

Now that I am back into the swing of things, I was able to jump back into improving the NSX-v Ansible module. Recently a member of the community opened an issue regarding the implementation method I had used for Edge NAT rules. Sure enough, they were correct in that the method I was using was really an append and not a creation.

Note: I wrote and tested the code against a pre-release version of NSX-v 6.4.1. There are no documented differences between the two API calls used in the nsx_edge_nat.py Ansible module in NSX 6.4.x or 6.3.x

When I looked at the NSX API, I realized there were two methods for adding NAT rules to an NSX Edge:

PUT /api/4.0/edges/{edgeId}/nat/config

URI Parameters:
edgeId (required) Specify the ID of the edge in edgeId.
Description:
Configure NAT rules for an Edge.

If you use this method to add new NAT rules, you must include all existing rules in the request body. Any rules that are omitted will be deleted.

And also:

POST /api/4.0/edges/{edgeId}/nat/config/rules

URI Parameters:
edgeId (required)
Specify the ID of the edge in edgeId.

Query Parameters:
aboveRuleId (optional)
Specified rule ID. If no NAT rules exist, you can specify rule ID 0.

Description:
Add a NAT rule above a specific rule in the NAT rules table (using aboveRuleId query parameter) or append NAT rules to the bottom.

The original code was using the second method, which meant each time an Ansible playbook was run, it would return an OK status because it was adding the rules — even if they already existed.

I decided to dive into the issue and spent a few more hours than I anticipated improving the code. It is now possible to use either method — one to create a full set of rules (deleting any existing rules) or appending new rules to the existing ruleset.

In order to create multiple rules, I modified how the Ansible playbook is interpreted. The example is in the readme.md file, but I want to highlight it here:

  1 ---
  2 - hosts: localhost
  3   connection: local
  4   gather_facts: False
  5   vars_files:
  6     - nsxanswer.yml
  7     - envanswer.yml
  8 
  9   tasks:
 10   - name: Create NAT rules
 11     nsx_edge_nat:
 12       nsxmanager_spec: '{{ nsxmanager_spec }}'
 13       mode: 'create'
 14       name: '{{ edge_name }}'
 15       rules:
 16         dnat0: { description: 'Ansible created HTTP NAT rule',
 17             loggingEnabled: 'true',
 18             rule_type: 'dnat',
 19             nat_enabled: 'true',
 20             dnatMatchSourceAddress: 'any',
 21             dnatMatchSourcePort: 'any',
 22             vnic: '0',
 23             protocol: 'tcp',
 24             originalAddress: '10.180.138.131',
 25             originalPort: '80',
 26             translatedAddress: '192.168.0.2',
 27             translatedPort: '80'
 28           }
 29         dnat1: { description: 'Ansible created HTTPS NAT rule',
 30             loggingEnabled: 'true',
 31             rule_type: 'dnat',
 32             vnic: '0',
 33             nat_enabled: 'true',
 34             dnatMatchSourceAddress: 'any',
 35             dnatMatchSourcePort: 'any',
 36             protocol: 'tcp',
 37             originalAddress: '10.180.138.131',
 38             originalPort: '443',
 39             translatedAddress: '192.168.0.2',
 40             translatedPort: '443'
 41           }

Please note the identifiers, dnat0 and dnat1, are merely that — identifiers for your playbook. They do not influence the API call made to the NSX Manager.

A new function was required in the Ansible module to allow for multiple rules to be appended to one another to make a single API call that would add each rule. The data structure used to create this dictionary of lists was rather convoluted since Python struggles to convert these sort of thing to XML properly. With some help from a few people in the NSBU, I was able to get it working.

def create_init_nat_rules:

 55 def create_init_nat_rules(client_session, module):
 56     """
 57     Create single dictionary with all of the NAT rules, both SNAT and DNAT, to be used
 58     in a single API call. Should be used when wiping out ALL existing rules or when
 59     a new NSX Edge is created.
 60     :return: return dictionary with the full NAT rules list
 61     """
 62     nat_rules = module.params['rules']
 63     params_check_nat_rules(module)
 64 
 65     nat_rules_info = {}
 66     nat_rules_info['natRule'] = []
 67 
 68     for rule_key, nat_rule in nat_rules.items():
 69         rules_index = rule_key[-1:]
 70         rule_type = nat_rule['rule_type']
 71         if rule_type == 'snat':
 72             nat_rules_info['natRule'].append(
 73                                     {'action': rule_type, 'vnic': nat_rule['vnic'], 'originalAddress': nat_rule['originalAddress'],
 74                                      'translatedAddress': nat_rule['translatedAddress'], 'loggingEnabled': nat_rule['loggingEnabled'],
 75                                      'enabled': nat_rule['nat_enabled'], 'protocol': nat_rule['protocol'], 'originalPort': nat_rule['originalPort'],
 76                                      'translatedPort': nat_rule['translatedPort'], 'snatMatchDestinationAddress': nat_rule['snatMatchDestinationAddress'],
 77                                      'snatMatchDestinationPort': nat_rule['snatMatchDestinationPort'], 'description': nat_rule['description']
 78                                     }
 79                                   )
 80         elif rule_type == 'dnat':
 81             nat_rules_info['natRule'].append(
 82                                     {'action': rule_type, 'vnic': nat_rule['vnic'], 'originalAddress': nat_rule['originalAddress'],
 83                                      'translatedAddress': nat_rule['translatedAddress'], 'loggingEnabled': nat_rule['loggingEnabled'],
 84                                      'enabled': nat_rule['nat_enabled'], 'protocol': nat_rule['protocol'], 'originalPort': nat_rule['originalPort'],
 85                                      'translatedPort': nat_rule['translatedPort'], 'dnatMatchSourceAddress': nat_rule['dnatMatchSourceAddress'],
 86                                      'dnatMatchSourcePort': nat_rule['dnatMatchSourcePort'], 'description': nat_rule['description']
 87                                     }
 88                                   )
 89 
 90         if nat_rule['protocol'] == 'icmp':
 91             nat_rules_info['natRule']['icmpType'] = nat_rule['icmpType']
 92 
 93     return nat_rules_info

I also took the opportunity to clean up some of the excessively long lines of code to make it more clearly readable. The result is a new working playbook for initial Edge NAT rule creation and the ability to add new rules later. There are a few items that remain where I would like to see some improvements — mainly I would like to add logic in to the code that, when performing an append, it will check to see if the rule already exists and skip it.

In the meantime, the code has been checked into GitHub and the Docker image I use for running Ansible has been updated.

Enjoy!

Ansible NSX module for creating NAT rules

After working on the code last weekend and testing the functionality this past week, I am proud to announce the ability to create NAT rules on an NSX Edge is possible through Ansible! The ability to create SNAT and DNAT rules on the NSX Edge was a necessity for the Infrastructure-as-Code project, as each environment deployed uses it’s own micro-segmented network. I am doing that so that each environment can be stood up multiple times within the same vSphere environment and be solely dependent upon itself.

The current Ansible module allows for the creation of both SNAT and DNAT rules. I used the NSX API Guide to be able to determine which variables are acceptable to be passed to either types of NAT rules and included each one in the function. As such, there are no features missing from the module today.

The module can be downloaded from the GitHub virtualelephant/nsxansible repo.

To use the module, I have created an example Ansible playbook (also available on GitHub):

test_edge_nat.yml

  1 ---
  2 - hosts: localhost
  3   connection: local
  4   gather_facts: False
  5   vars_files:
  6     - nsxanswer.yml
  7 
  8   tasks:
  9   - name: Create SSH DNAT rule
 10     nsx_edge_nat:
 11       nsxmanager_spec: '{{ nsxmanager_spec }}'
 12       mode: 'create'
 13       name: '{{ edge_name }}'
 14       rule_type: 'dnat'
 15       vnic: '0'
 16       protocol: 'tcp'
 17       originalAddress: '10.0.0.1'
 18       originalPort: '22'
 19       translatedAddress: '192.168.0.2'
 20       translatedPort: '22'
 21 
 22   - name: Create default outbound SNAT rule
 23     nsx_edge_nat:
 24       nsxmanager_spec: '{{ nsxmanager_spec }}'
 25       mode: 'create'
 26       name: '{{ edge_name }}'
 27       rule_type: 'snat'
 28       vnic: '0'
 29       protocol: 'any'
 30       originalAddress: '192.168.0.0/20'
 31       originalPort: 'any'
 32       translatedAddress: '10.0.0.1'
 33       translatedPort: 'any'

Update Jan 30th:

The module was updated a few days ago to include the ability to delete a NAT rule from an Edge. The functionality allows the consumer to write a playbook with the following information to delete an individual rule.

  1 ---
  2 - hosts: localhost
  3   connection: local
  4   gather_facts: False
  5   vars_files:
  6     - nsxanswer.yml
  7     - envanswer.yml
  8 
  9   tasks:
 10   - name: Delete HTTP NAT rule
 11     nsx_edge_nat:
 12       nsxmanager_spec: '{{ nsxmanager_spec }}'
 13       mode: 'delete'
 14       name: '{{ edge_name }}'
 15       ruleId: '196622'

Let me know if you are using the NSX Ansible modules and what other functionality you would like to see added.

Enjoy!

Docker for Ansible + VMware NSX Automation

I am writing this as I sit and watch the annual viewing of The Hobbit and The Lord of the Rings trilogy over the Christmas holiday. The next couple of weeks of time should provide the time necessary to hopefully complete the Infrastructure-as-Code project I undertook last month. As part of the Infrastructure-as-Code project, I spoke previous about how Ansible is being used to provide the automation layer for the deployment and configuration of the SDDC Kubernetes stack. As part of the bootstrapping effort, I have decided to create a Docker image with the necessary components to perform the initial virtual machine deployment and NSX configuration.

The Dockerfile for the Ubuntu-based Docker container is hosted both on Docker Hub and within the Github repository for the larger Infrastructure-as-Code project.

When the Docker container is launched, it includes the necessary components to interact with the VMware stack, including additional modules for VM folders, resource pools and VMware NSX.

To launch the container, I am running it with the following options to include the local copies of the Infrastructure-as-Code project.

$ docker run -it --name ansible -v /Users/cmutchler/github/vsphere-kubernetes/ansible/:/opt/ansible virtualelephant/ubuntu-ansible

The Docker container is a bit on the larger side, but it is designed to run locally on a laptop or desktop. The image includes the required Python and NSX bits so that the additional Github repositories that are cloned into the image will operate correctly. The OpenShift project includes additional modules for interacting with vSphere folders and resource pools, while the NSX modules from the VMware Github repository includes the necessary bits for leveraging Ansible with NSX.

Once running, the Docker container is then able to bootstrap the deployment of the Infrastructure-as-Code project using the Ansible playbooks I’ve published on Github. Enjoy!

VCDX Quick Hit – Monitoring and Alerting

This is the first post in what I plan to be a sporadic, yet on-going series highlighting certain aspects of a VCDX / Architect skillset. These VCDX Quick Hits will cover a range of topics and key in on certain aspects of the VCDX blueprint. It is my hope they will trigger some level of critical thinking on the readers part and help them improve their skillset.

The idea for this post came after listening to a post-mortem call for a recent incident that occurred at work. The incident itself was a lower priority Severity 2 incident, meaning it only impacted a small subset of customers in a small failure domain (a single vCenter Server). As architects, we know monitoring is a key component of any architecture design — whether it is intended for a VCDX submission or not.

In IT Architect: Foundation in the Art of Infrastructure Design (Amazon link), the authors state:

“A good monitoring solution will identify key metrics of both the physical and virtual infrastructure across all key resources compute, storage, and networking.”

The post-mortem call got me thinking about maturity within our monitoring solutions and improving our architecture designs by striving to understand the components better earlier in the design and pilot phases.

It is common practice to identify the key components and services of an architecture we designed, or are responsible for, to outline which are key to support the service offering. When I wrote the VMware Integrated OpenStack design documentation, which later became the basis for my VCDX defense, I identified specific OpenStack services which needed to be monitored. The following screen capture shows how I captured the services within the documentation.

As you can see from the above graphic, I identified each service definition with a unique ID, documented the component/service, documented where the service should be running, and a brief description of the component/service. The information was used to create the Sprint story for the monitoring team to create the alert definitions within the monitoring solution.

All good right?

The short answer is, not really. What I provided in my design was adequate for an early service offering, but left room for further maturity. Going back to the post-mortem call, this is where additional maturity in the architecture design would have helped reduce the MTTR of the incident.

During the incident, two processes running on a single appliance were being monitored to determine if they were running. Just like my VMware Integrated OpenStack design, these services had been identified and were being monitored per the architecture specification. However, what was not documented was the dependency between the two processes. In this case, process B was dependent on process A and although process A was running, it was not properly responding to the queries from process B. As a result, the monitoring system believed everything was running correctly — it was from an alert definition perspective — and the incident was not discovered immediately. Once process A was restarted, it began responding to the queries from process B and service was restored.

So what could have been done?

First, the architecture design could have written an alert definition for the key services (or processes) that went beyond just measuring whether the service is running.

Second, the architecture design could have better understood the inter-dependencies between these two processes and written an more detailed alert definition. In this case, there was a log entry written each time process A did not correctly respond to process B. Having an alert definition for this entry in the logs would have allowed the monitoring system to generate an alert.

Third, the architecture design could have used canary testing as a way to provide a mature monitoring solution. It may be necessary to clarify what I mean when I use the term canary testing.

“Well into the 20th century, coal miners brought canaries into coal mines as an early-warning signal for toxic gases, primary carbon monoxide. The birds, being more sensitive, would become sick before the miners, who would then have a chance to escape or put on protective respirators.” (Wikipedia link)

Canary testing would them imply a method of checking the service for issues prior to a customer discovering them. Canary testing should include common platform operations a customer would typically do — this can also be thought of as end-to-end testing.

For example, a VMware Integrated OpenStack service offering with NSX would need to ensure that both the NSX Manager is online, but also that the OpenStack Neutron service is able to communicate to it. A good test could be to make an OpenStack Neutron API call to deploy a NSX Edge Service Gateway, or create a new tenant network (NSX logical switch).

There are likely numerous ways a customer will interact with your service offering and defining these additional tests within the architecture design itself are something I challenge you consider.