Apache Mesos Clusters – Part 2

Building Mesosphere & Apache Mesos into BDE:

After playing with Mesosphere in AWS for the week, getting familiar with the packages and the deployment process, the real work has begun — getting the Mesosphere stack (Apache Mesos, Apache Zookeeper,  Mesosphere Marathon, Chronos and HAProxy) deployed through VMware Big Data Extensions. Fortunately, BDE v2.1 has some example JSON cluster definition files that can be used for deploying different types of clusters and these are perfect for modification in this use-case.

The example files are located in the directory /opt/serengeti/samples. I used the basic_cluster.json file in the directory as the template. From there, I modified the file based on what the Mesosphere stack deployed in AWS, with some slight modifications. I chose to have a base Mesos cluster include 3 master nodes and 6 worker nodes. The master nodes are allocated with 2vCPU, 8GB RAM and 50GB of disk space. The worker nodes are allocated with 2vCPU, 8GB RAM and 100GB of disk space.

The remainder of the post will go through all the various pieces that are necessary to utilize the Big Data Extensions framework to offer the Mesosphere stack within a VMware virtual environment.

Continue reading “Apache Mesos Clusters – Part 2”

Apache Mesos Clusters – Part 1

I watched a webinar today from Ken Sipe (@kensipe) from Mesosphere on Mesos, Marathon and Chronos. The topics covered included how Mesos works, configuring and standup of a Mesos cluster in various public cloud offerings. If you are unfamiliar with Mesos, I would direct you to Mesosphere and the Apache Mesos Project.

The basic explanation of from the Apache Mesos Project page states:

Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.

Think turning an entire datacenter of compute resources into a single pool to be consumed. Instead of carving out individual pieces of compute, Mesos handles the scheduling and helps you scale an application across all of the resources available to it.

So how quickly can you deploy a cluster and begin using Mesos?

Continue reading “Apache Mesos Clusters – Part 1”

Quick and dirty PowerCLI cmdlets

I am preparing for my VCAP-DCA exam and having to automate more and more of my daily tasks within our VMware environment at work — as a result, I am using PowerCLI constantly. As I result, I thought I would share a couple quick and dirty little scripts that I have had to use lately.

Mileage will vary, but they’ve proven useful for me at work (some variables values have been edited).

Changing Network Label

> $i = 1
> $VLAN = "VLAN1012"
> while ($i -le 20) {
> $VMName = "prefix" + $i + ".site"
> Get-VM -Name $VMName | Get-NetworkAdapter | Set-NetworkAdapter -Portgroup $VLAN -Confirm:$false | Out-Null
> $i++
> }
>

Deleting and Adding a New Disk

> $i = 1
> while ($i -le 20) {
> $VMName = "prefix" + $i + ".site"
> Get-HardDisk -VM $VMName | Remove-HardDisk -Confirm:$false | Out-Null
> New-HardDisk -VM $VMName -CapacityGB 525 -Datastore "DRSCluster1" -Confirm:$false | Out-Null
> $i++
> }
>

Changing Allocated RAM

> $i = 1
> while ($i -le 20) {
> $VMName = "prefix" + $i + ".site"
> Set-VM -VM $VMName -MemoryGB 32 -Confirm:$false | Out-Null
> $i++
> }
>

Virtualized Hadoop + Isilon HDFS Benchmark Testing

During the VMworld EMEA presentation (Tuesday October 14, 2014) , the question around performance was asked again with regards to using Isilon as the data warehouse layer and what positives and negatives are associated with leveraging Isilon as that HDFS layer. As with any benchmark or performance testing, results will vary based on the data set you have, the hardware you are leveraging and how you have the clusters configured. However, there are some things that I’ve learned over the last year and a half that are applicable on a broad scale that can show the advantages to leveraging Isilon as the HDFS layer, especially when you have very large data sets (10+ Petabytes).

There are two benchmarking tests I want to focus on for this post. The tests themselves demonstrate the necessity for understanding the workload (Hadoop job), the size of the data set, and the individual configuration settings (YARN, MapReduce, and Java) for the compute worker nodes.

Continue reading “Virtualized Hadoop + Isilon HDFS Benchmark Testing”

VCP5: Creating an iSCSI lab environment for vSphere

As I worked through the VCP5-DVC blueprint, the necessity to revisit iSCSI storage configuration and management became a key point of my study efforts. I had not used iSCSI storage before within a VMware vSphere environment, so learning how to tie it all into the infrastructure was totally new to me. In fact, the last time I had used iSCSI storage was with my previous employer 5+ years ago within a customized CentOS OpenVZ environment.

Fortunately, Google did not fail me and there were many resources readily available for teaching me how to implement iSCSI storage within a CentOS Linux virtual machine. From there it was a matter of creating a storage adapter within vCenter and exporting the iSCSI datastores to the environment.

This post will go through the steps to configure the iSCSI storage within a Linux VM, export it to vCenter and add it into the IaaS offering as a VMFS datastore. I found this extremely helpful in my preparation for the exam and in learning how to troubleshoot misconfiguration settings within the iSCSI VM — making mistakes are often the best way to learn!

Continue reading “VCP5: Creating an iSCSI lab environment for vSphere”