Are We Repeating the Hadoop Mistake with Databricks and Snowflake?

Cameron Price
Jul 8
4 min read

Updated: Jul 14

Can you remember when Hadoop was the future of big data? It promised a revolution in low cost, commoditized compute, and it was going to solve all the challenges of traditional data warehousing. Organizations across industries embraced the opportunity, investing heavily in clusters, skills, infrastructure, and processes around this promising technology. In the process, we forgot the hard learnt lessons of the past and ploughed on.

Fast forward just a few short years, and many companies found themselves stranded, locked into complex solutions, outdated technology, facing expensive migrations and painful transitions as the cloud revolution swept in with supposedly better solutions.

As the saying goes, “history repeats itself”. Today, we're observing a familiar pattern emerging with all-in-one Data & AI discussions being put forward from vendors like Databricks and Snowflake.

"The definition of insanity is doing the same thing over and over again and expecting different results." — Albert Einstein.

I understand the strategy, from a vendor’s perspective is sound - extend the stack to a broader set of functions, and become indispensable, becoming “sticky” to the customer. Don't get me wrong, these platforms are immensely powerful and offer simplicity, performance, and integration that make them highly attractive.

Yet, there's a significant hidden risk when going "all-in" with a single vendor. You're setting yourself up for considerable business risk. The risk of vendor lock-in limits flexibility, complicates future technological transitions, and significantly escalates future costs.

A senior executive in one of these vendors told me, “in the next 4 years, every company will need to choose a new platform, and that’s our opportunity”. But do customers really need to choose a new platform? Are they getting anything genuinely new?

I tackled this question in an earlier blog: https://www.data-tiles.com/post/decades-of-data-evolution-same-old-roadblocks

Let’s consider the key risks:

• Flexibility Constraint. The pace of technological innovation is relentless. We don’t know what technology will be available in 5 years, but we do know businesses will want to take advantage of that technology without significant capital outlay. Organizations that lock all their data, analytics, and workflows into one proprietary system lose the ability to easily incorporate emerging tools. They become “sticky” as the switching cost becomes a barrier to change.

"Highly distributed and diversified enterprises... will continue to seek ways to reduce the risk of vendor lock-in and ensure that their BI platform remains cloud-/platform-independent." — Forrester Research, 2023

• Vendor Lock-In. What if your chosen vendor significantly increases prices, faces outages, or fails to innovate? Companies deeply entrenched in a single vendor ecosystem can find themselves without leverage or viable alternatives.

“The choice of tooling... helps mitigate the risk of asset re-development... and prevents you from vendor lock-ins.” — Deloitte UK

“One-third (33%) of businesses are concerned about vendor lock-in/lock-out as a top three concern when implementing industry cloud solutions.” — Deloitte US

• Hidden Costs and Complexit. Eventually, businesses outgrow platforms or need to pivot. At that point, migrating away from these platforms becomes a major undertaking, with massive re-engineering of data, pipelines, applications, and processes. Sound familiar? It should. It’s the same scenario many faced during painful Hadoop exits.

"Too often, the simplicity of today translates into rigidity tomorrow." — Decades of Data Evolution — Data Tiles

You might argue that a single platform approach simplifies your stack, centralizes support, and reduces short-term complexity. That was the thinking in the data warehouse appliance era (pre cloud), during the big data phase, and again with the rise of data lakes and cloud data warehouses. But the goal of simplicity often becomes a trap. Companies spend years ingesting data into one environment only to run “repatriation” programs to bring it back on-prem due to spiraling costs and complexity.

“Technology should serve the business—not the other way around. Every investment must have a clear ROI.” — Data Tiles Blog

Too often, the simplicity of today often translates into rigidity tomorrow. Remember, Hadoop initially appeared to simplify big data processing as well, only to evolve into a complicated ecosystem that many organizations struggled to unwind, driving an explosion of data engineering as a discipline, with questionable value, which customers are still dealing with today.

So, What Should We Be Doing Instead?

This isn't a rejection of platforms like Databricks or Snowflake. It’s about using platforms where appropriate. It’s about using platforms for use cases that make sense, not using technology for technology's sake. It's about learning from our history, consciously recognizing and mitigating the risks associated with deep vendor lock-in and adopting a strategic, forward-looking approach.

“Vendor lock-in is a trap. Enterprises need to architect for freedom, not just speed.” — Zhamak Dehghani, Creator of Data Mesh

Organizations can maintain flexibility by employing open standards, modular architectures, and having clear exit strategies to avoid repeating the same costly mistakes of the past.

"While it may come as no surprise that data and analytics are reshaping industry competition... the persistently lackluster response to this phenomenon by most companies should raise some eyebrows." — McKinsey & Company

“Only one-third [of leaders] say they have succeeded in creating a data-driven culture.” — Harvard Business Review

“Company culture is a harder hurdle to clear than any technical problem.” — Harvard Business Review

The lesson from the Hadoop era is clear. History often repeats itself, especially in technology. It’s a history that in my career I was deeply embedded in, witnessing the elation and the disappointment combined. Let’s approach today's popular platforms thoughtfully, balancing immediate convenience with long-term agility and adaptability. By doing so, we can build resilient data strategies designed to adapt and thrive, dealing with the ever-changing technology and techniques without wasting capital, which is ever so important in a world of austerity.

Join me in a Data Coversation,

Cameron Price.

Related Reading on Data Tiles Website:

Decades of Data Evolution—Same Old Roadblocks – How today’s problems aren’t new, just repackaged
The Unstructured Data Bottleneck – Why the real breakthrough lies in accessibility and user empowerment

References

Harvard Business Review – Davenport & Bean (2018): Big Companies Are Embracing Analytics, But Most Still Don’t Have a Data-Driven Culture
Harvard Business Review – Bean (2022): Why Becoming a Data-Driven Organization Is So Hard
McKinsey (2019): How Leaders in Data and Analytics Have Pulled Ahead
Forrester (2023): Key Findings on BI Platform Independence
Forrester (2025): Enterprise Software Earnings Reveal AI-Driven Power Shift
Deloitte UK: Cloud Data Platform
Deloitte US: Multi-Cloud Adoption Concerns
Data Tiles: Decades of Data Evolution
Data Tiles: The Unstructured Data Bottleneck
Zhamak Dehghani, Principles of Data Mesh (martinfowler.com)