Big Data: Is the cloud the enemy of Hadoop?

One in two companies has developed or is in the process of implementing its ‘data lakes’, Big Data storage repositories. However, the projects come up against a sort of unexpected natural barrier, the cloud!

Within Big Data, Hadoop has seen unprecedented success. And it created a market estimated by Forrester at $800 million in 2017. Driven by the three historical Big Data players, Hortonworks, Cloudera, and MapR, Hadoop is making its mark. As proof of this, we see the explosion of ‘data lakes’, the lakes of data in which analytical tools are immersed.

Where are the plans for building a data lake in companies?
15% – Implemented, it is being expanded;
33% – In the process of being implemented;
31% – Planned within the next 12 months;
13% – Interested but no project yet;
5% – Not interested.
48% of companies, or about 1 out of 2, have deployed a data lake, and 79%, or 8 out of 10, will have done so before the end of the decade.
Change of strategy after 10 years of Hadoop
However, as it celebrates 10 years this year, the Apache Hadoop community seems to be taking a break. In question, the high weight of the complexity of the projects and technologies associated with Hadoop, but more worryingly companies cite the overlap of architectures and cloud services.

Concretely, companies face the complexity of Hadoop by turning to the public cloud. The actors of the latter indeed offer solutions qualified as ‘serverless’ which exploit SQL queries or which directly use Spark without going through the Hadoop box.

The world of Big Data must therefore face two movements: Hadoop presents itself as the natural choice for exploiting large volumes of data; businesses want to use the public cloud for the same reason. But… Hadoop was not designed for the cloud!

The dilemma of Big Data, Hadoop, or alternative?
This is how companies carrying out Big Data projects find themselves faced with a dilemma: should they set up projects based on Hadoop, which correspond to their expectations despite a higher complexity rate, but which do not support the cloud well?; or should they integrate Big Data into their cloud projects by turning to the alternative offers offered in the public cloud by Amazon, Google or IBM?

What is surprising in this case is that if Hadoop will continue to dig its furrow and impose itself, the large central systems and data warehouses, which no one saw disappearing, will continue to survive, and probably adopt hybrid strategies outside of Hadoop. Quite simply because even in the digital age, humans struggle to keep up with and move forward at the pace of digital change.