Having recently returned from Hadoop Summit 2014 in San Jose, I wanted to take some time to jot down my thoughts on the sessions. I primarily focused on the sessions that revolved around operational management of Hadoop to see how other companies are tackling the same problems I am facing. It is comforting to know that I am not alone in my quest to deliver a reliable Hadoop platform across the development lifecycle for my internal customers to consume. However, one of the frustrating things to witness was the inherent lack of large-scale organizations operating within their own private cloud environments. Many of the demonstrations involved utilizing resources from AWS or made the assumption you would never run out of bare-metal hardware to deploy on. My experience is wholly different.
The challenging part of offering a true Hadoop-as-a-Service platform is the expectation that additional resources will always be available for an Engineering team or Operations team to consume at a moments notice. For that to be the case, in my experience, AWS becomes too expensive too quickly and bare-metal hardware is difficult to procure at a moments notice within a large, publicly traded organization. For that, a private cloud environment is perfect — but no one wants to openly talk about running Hadoop on a virtual platform. Which, when you start thinking about it is quite humorous because most demonstrations showed Hadoop running in AWS — what do they think an EC2 instance is exactly?
My talk with Andrew Nelson on running a production Hadoop-as-a-Service platform using VMware vCenter Big Data Extensions went well. The audience was well-educated and we received some rather good questions at the end. Virtualizing Hadoop for my organization has been a great way to solve many of the lifecycle management issues faced in today’s rapidly changing environment.
All that being said, here are the key takeaways/questions I gained from Hadoop Summit 2014: