The conference was completed just over two weeks ago, and since then I’ve had the opportunity to go through my notes, think about the sessions I attended and summarize what insight I gained while there.
The biggest takeaway I had for VMworld 2014 compared to last year revolved around lessons learned in 2013 were applied in 2014. The key insight in 2013 was that many other partners and customers of VMware were facing the same challenges around standardization, automation and self-service. It was helpful to learn that the things we were trying to accomplish within our department at Adobe were not unique to us.
This year, 2014, I learned that we have solved many of the challenges from the last year and now have great insight to offer out to the community. As we work towards building on the standardization, automation and self-service phases of offering both comprehensive IaaS and PaaS offerings, we are doing what we can to share that information with the broader community.
All of that is wonderful, but what are the next steps for our team, the market and others in the virtualization space? We heard a lot at the conference about OpenStack, Docker, VSAN and other emerging technologies. The focus I personally have for the next year is going to revolve around further implementation of the Hadoop ecosystem, using VMware technologies, and building out larger, comprehensive PaaS offerings.
There are many questions to be answered around how OpenStack and Docker plays in the space. I am looking forward to the challenges coming to us as we work with our engineering teams.
Should be an exciting year!
Yesterday was another great day in San Francisco and VMworld 2014. The big takeaway I had revolved around Docker and VMware integration. There is a great article over on the Office of the CTO blog regarding this exact topic. Two key takeaways the CEO of Docker said during his portion (paraphrased):
- Use VMs for security and consistency and use Docker for speed of deployment.
- Docker + VMware gets you the best of both worlds when utilized together
There are some exciting things, like Project Fargo, going on in the space right now that should enable Operations teams to incorporate Docker into their existing environments to give their applications the flexibility next-generation apps and engineering teams are starting to require.
Beyond the sessions, the CTO party last night was really amazing! Lots of networking and conversations were taking place and I was able to gain some good insight into how Mesos could be used to replace YARN. I am excited to follow-up on several of the conversations last night.
Today kicked off the US VMworld 2014 conference in San Francisco and it was pretty exciting. The first of two VAPP1428 sessions took place this afternoon where I had an opportunity to talk about the exciting work I have been doing the last year at Adobe to build out a Hadoop-as-a-Service offering. It was a great talk and Andrew Nelson was an awesome co-presenter who offered great insight into what VMware is doing in the virtual Hadoop space.
A few hours later, I was fortunate to have an opportunity to sit on a panel with several other distinguished guests to talk about best practices around Hadoop and Big Data in a virtualized environment. In that session we were able to share our insights into the decisions that each of made for our organizations and the successes we have seen in the space utilizing not only VMware but also understanding Hadoop for our individual workloads.
It was great to hear Chris from FedEx talk about how they too are utilizing the EMC Isilon HDFS plugin to offer out a unified HDFS layer to the virtual environment for Big Data Extensions to build compute clusters. We have been able to do some great work with EMC the past few months — details to be provided in a future post — around how using the Isilon storage that is already within our data centers will allow us to offer a robust storage layer to our Hadoop clusters.
All in all, it was a great first day of the conference. There were many exciting announcements made during the keynote, not the least of which was the partnership VMware now has with both Docker and OpenStack. There are a lot of exciting things happening in that space and as I look to the future it is pretty awesome.
A big shout out to all the individuals who came and asked questions. It is my hope that everyone who attended walked away with at least one positive takeaway. If you have any additional questions, please reach out to me on Twitter (@chrismutchler) or through email.
There is a quote in the book “Hadoop Operations by Eric Sammer (O’Reilly)” where it states:
“The complexity of sizing a cluster comes from knowing — or more commonly, not knowing — the specifics of such a workload: its CPU, memory, storage, disk I/O, or frequency of execution requirements. Worse, it’s common to see a single cluster support many diverse types of jobs with conflicting resource requirements.”
In my experience that is a factual statement. It does not however, preclude one from determining that very information so that an intelligent decision can be made. In fact, VMware vCenter Operations Manager becomes an invaluable tool in the toolbox when developing the ability to maintain the entire SDLC of a Hadoop cluster.
Initial sizing of the Hadoop cluster in the Engineering|Pre-Production|Chaos environment of your business will include some amount of guessing. You can stick with the tried and true methodology of answering the following two questions — “How much data do I have for HDFS initially?” and “How much data do I need to ingest into HDFS daily|monthly?” It is at this point that you’ll need to start monitoring the workload(s) placed on the Hadoop cluster and begin making determinations for the cluster size once it moves into the QE, Staging and Production environments.
Continue reading “Workload-based cluster sizing for Hadoop”
With VMworld 2014 in the United States fast approaching, I have been working on building out my schedule based on my personal objectives and checking the popular blogger sites for their recommendations. In that spirit, I thought I would share the sessions I am most excited about this year in San Francisco.
Last year was my first year at VMworld and I focused on the Hands-on-Labs (HoLs) and generic sessions to better understand the VMware ecosystem. This year I am focused on three primary topics:
- VMware NSX
- Openstack|Docker|Containers with VMware
- VMware VSAN
Here are the sessions I am focused on:
- SEC1746 NSX Distributed Firewall Deep Dive
- NET1966 Operational Best Practices for VMware NSX
- NET1949 VMware NSX for Docker, Containers & Mesos
- SDDC3350 VMware and Docker — Better Together
- SDDC2370 Why Openstack runs best with the vCloud suite
- STO1279 Virtual SAN Architecture Deep Dive
- STO1424 Massively Scaling Virtual SAN implementations
In addition to that, I am also excited for my own sessions at VMworld this year around Hadoop , VMware BDE and building a Hadoop-as-a-Service!
- VAPP1428 Hadoop-as-a-Service: Utilizing VMware Cloud Automation Center and Big Data Extensions at Adobe (Monday & Wednesday sessions)
Excited for the week to get kicked off and see all the exciting things coming to our virtualized world.