Pivotal first amongst equals on the Open Data Platform

By: Craig Wentworth, Principle Analyst, MWD AdvisorsPosted: 25th February 2015This work is licensed under a Creative Commons License

Ever thought the world of open source Big Data components was, well, a little uncoordinated; a little too open?
If you had the budget, the talent, and the inclination you could always harden your own enterprise infrastructure—taking raw open source tools and making them work the way your business needs them to (with high availability, high performance, and robust security).
Or you could haul in some pre-hardened editions where vendors have taken an open source favourite and added their own enterprise-friendly extras into the distribution (or replaced portions with API-compatible proprietary code)—viz MapR with Hadoop, DataStax with Cassandra, etc. But here each vendor is only taking care of their own part of the Big Data infrastructure puzzle; you’d still need to provide care and feeding yourself to combine the parts into a whole platform.
Or you could trust your Big Data project to one of the big vendors, and end up with their take on Hadoop, NoSQL, streaming databases, and relational data warehouses, etc. The likes of IBM, Teradata, Oracle, and Amazon Web Services, etc. all have suites (either available in on-premise appliances, or hosted as services in the cloud) designed to offer up a platform upon which you can stack up your analytics engines, business applications, and the like. But then, for the most part, you are beholden to a single vendor’s take on the whole Big Data movement (albeit one likely as not constructed through partnerships and acquisitions rather than necessarily organically grown from the ground up).
Depending on your outlook, any of these approaches might well be perfectly serviceable. However, if your particular take on the balance between openness and enterprise friendliness isn’t catered for above, you’ve probably been waiting for something else—essentially open, but with enterprise sensibilities; community-minded, but with development priorities skewed towards real business applications… a sort of downstream Apache Foundation more firmly serving the Ops side of DevOps.
Enter the newly announced Open Data Platform (ODP) initiative, proclaiming itself as “a shared industry effort focused on promoting and advancing the state of Apache Hadoop and Big Data technologies for the enterprise”. The premise behind the initiative (which partners as varied as Pivotal, Hortonworks, IBM, Teradata, SAS, Splunk, Altiscale, EMC, VMware, Infosys, Capgemini, GE, CenturyLink, Verizon, and a so-far anonymous “international telco” have signed up to support) is that the current Big Data ecosystem suffers from fragmentation and duplication of effort that stymies innovation, and that a common core platform for Big Data—their platform—will free up both enterprises and other vendors with ecosystem interests to focus on value-adding business applications.
Of course other vendors with Big Data interests beg to differ that the ODP’s view of commonality is one destined to altruistically serve their own established ecosystems—Cloudera, for example, declined to join up; citing in a blog that a) they just weren’t hearing those complaints from their own customers and partners; and b) despite it’s supposed pan-organisational support, it appeared driven more by the interests of vendors rather than ISVs at the sharp end of customer dissatisfaction—and one vendor in particular (a quick who.is look-up reveals that the opendataplatform.org domain is registered to a certain Richard Snee—Pivotal’s CMO).
The news certainly provides backstory to the parallel announcement that Pivotal is open sourcing much of its Big Data offering (GemFire, GreenplumDB and HAWQ). Although, of course, anyone is now free to tinker with the code, with the ODP in place Pivotal has co-founded a vehicle through which to help orchestrate much of the ensuing development. If the community enriches the core with helpful additions too, then so much the better for Pivotal. As for whether it gives consumers more stabilised platforms for Hadoop, etc. too remains to be seen.