top of page

Decades of Data Evolution, Same Old Roadblocks

Updated: Aug 21, 2024



The journey of data is filled with milestones and innovations that have transformed how we handle and interpret data. Yet, despite all the technological advancements, certain roadblocks remain stubbornly persistent: complexity, time, and cost.



The 1970s: The Birth of Data Marts

 

In the 1970s, A.C. Nielsen introduced the concept of the Data Mart. This innovation allowed specific departments within a company to access the data they needed without wading through vast amounts of information. It was a game-changer, offering businesses the ability to perform targeted analysis and make quicker decisions. Now think about this. This is before databases were even invented. They were working on a file-based system (sound familiar – big data anyone?). However, Data Marts often led to data silos, with different departments maintaining separate datasets, leading to inconsistencies and a lack of a unified data view, and the skills required to operate such systems were very limited.

 

The 1980s: The Rise of Relational Database Management Systems (RDBMS)

 

The 1980s saw the advent of Relational Database Management Systems (RDBMS), which revolutionized how data was stored and accessed. These systems provided a more efficient and reliable way to manage data, making it easier to retrieve and manipulate information. Despite these advancements, integrating data from various sources remained a significant challenge. Again, we had the complexity of technology as organisations and practitioners had to come to understand the concepts of data modelling. This led to silos as models were stand alone, and the ability was still in the hands of a few who understood data architecture and data modelling. This was exasperated by the fact that these systems were not designed for analytical workloads but more OLTP systems for record keeping.

 

The 1990s: Data Warehousing and OLAP

 

Enter the data warehouse, to provide some level of standardisation and a focus on analytics. In the 1990s, the data warehousing concept, championed by Bill Inmon (3NF), and Online Analytical Processing (OLAP), popularized by Ralph Kimball, took centre stage. Both were an attempt to make this easier, and faster, getting fat into the hands of those that need it. Data warehouses promised a centralized repository for all data, breaking down the silos created by Data Marts. Installing a concept of a single source of truth. Likewise, OLAP enabled more complex queries and analysis, in a business context, enhancing business intelligence capabilities.

 

Unfortunately, the silos remained, and the issues of complexity, time, and cost did not change. Hyper skills were ever more needed to design and construct these data warehouse, and a significant amount of resource was lost in just acquiring and ingesting data. In fact, in many cases working with customers around the world, had policies of an ingestion first strategy. That is, will get all the data in one central place, one central data warehouse, and all my questions will be answers. All those companies were doomed to fail, and they did.

 

It was during this period that I entered the data world with my first role at A.C. Nielsen, fresh out of university. The initial excitement of working with data to answer business questions and enable decisions was palpable. However, building and maintaining these data assets was a monumental task. ETL (Extraction, Transformation, and Loading) processes were complex and time-consuming. Data integration from various sources was fraught with challenges, often leading to delays and data quality issues.

 

The ability to achieve such tasks was siloed into the hands of technical skills, and ultimately the business could not get access to data.

 

 

The 2000s: The Advent of Big Data

 

In the early part of the 21st century we saw the advent of data warehouse appliances. Dedicated hardware that was design for analytical processes. Only available to a few, due to their expense, and complexity, this was the first attempt to throw serious compute at the problem. But compute wasn’t the issue. The issue was still complexity, time, and cost, and especially cost as these system cost millions to acquire and run and require very specific talent.

 

The early 2000s marked the beginning of the Big Data era. This was hailed as the revolution. Finally, we can use the concept of open-source software and commodities compute to resolve the issues of the @data warehouse era. Hadoop, developed by Doug Cutting and Mike Cafarella, enabled the processing of massive datasets, promising insights at an unprecedented scale. Companies like Cloudera, Hortonworks, and MapR emerged between 2008 and 2011, offering platforms to manage and analyse big data.

 

However, these advancements came with new challenges. The technology was never designed to be used for data warehouse use cases. It was designed to analyse unstructured data like search strings.

 

We saw a further fragmentation of data silos as data science sprung up, with data now in Hadoop land, and in warehouses and marts across an organisation. Furthermore, these technologies, by their nature, were very immature, hence hyper skills were needed to operate and run them.

 

Integrating these systems and the data they produced across various departments remained a nightmare. Data governance and quality became critical issues, with different teams using different standards and formats.

 

The merger of Cloudera and Hortonworks in 2018 and the retreat of MapR in 2019 highlighted the ongoing struggles within the industry. And within a 10-year period, the Hadoop era had come to an end.

 

But we still had the issues of complexity time and cost, and we were still burning huge amounts of capital to solve these challenges with very little output.

 

The 2020s: Cloud Data Warehouses, Lakehouses, and Beyond

 

The 2020s have seen the rise of cloud data warehouses, offering scalability and flexibility previously unimaginable. And one. Again, through the promise of centralization to resolve these data challenges.

 

The concept of the Lakehouse emerged, combining the best features of the Hadoop era with data warehouses and data lakes attempting to streamline data management and analytics. 

 

But again, we haven’t learnt from the past. These conversations and narratives are vendor led.  As we move into 2024 and beyond, the focus on centralizing data is flawed and it will fail, but we are blinded by the hype.

 

But companies are starting to wake up. In a recent study conducted by Cube Research, over 54% of companies will no longer buy a solution that locks their data into one data vendor or cloud, with 50% stating that they will look for portability to ensure flexibility. 55% are not consolidating their data onto a single integrated tech stack as it’s sacrificing the flexibility to use different tools.

 

The Roadblocks That Won’t Budge

 

Despite decades of evolution, some roadblocks remain as formidable as ever:

 

  1. Complexity: This can be broken into two components. The first is technology. As we can see from the timeline outlined, technology is constantly changing. And that will not change. But this is happening at speed, and we can no longer afford to spend huge amounts of capital to take advantage of that technology. Our capabilities need the ability to adapt and evolve to meet that velocity to ensure the organization remains competitive.


    The second is talent. Hyper skills to build, operate, and run such platforms has always been a major issue. This type of talent is hard to find, difficult to train. And expensive. A recent study by Gartner (https://gtnr.it/3VPwcct) indicated that out of 179,000 candidates available, by adding a few skills requires, only 4,000 of those candidates were appropriate for the roles.


  2. Time: Digital transformation is all about speed. And companies are starting to realize that that operating in a digital world means that transformation never ends. Organizations are constantly inventing or reinventing stakeholder experiences where they be customers, suppliers, employees, or other stakeholders.


    So, time is key. I can’t spend months developing new data capabilities, exploring the organization to develop new insights to enables decisions. It must happen now. The data needs to be in the hands of many.


  3. Cost: In a world where austerity is prevalent, where compliance to ESG requirements need to be met, we cannot continue to spend the levels of capital that we are for questionable return.

 

Ensuring compliance with evolving data privacy regulations while balancing accessibility with security is a constant struggle. Managing costs and ensuring efficient performance at scale require continuous effort and optimization. Organisations require speed of analytics but at an acceptable cost, but also delivering new capability, not just reinventing the past.

 


 

 

Repeating the Past: Lessons Not Learned

 

What’s more frustrating is the apparent lack of learning from the past. We keep repeating the same patterns and rebranding them as new. Or worse, reinventing features from the last and saying that they are new.

 

A very common pattern as I work with customers across the world, is to take data warehouse design patterns developed around 2010 and migrate them to the cloud with little to no change. This means the problems from that era were simply reinvented in the cloud, rather than being solved. The same issues of complexity, time, and cost have followed us into the cloud era, showcasing a lack of true innovation.

 

I have spent a huge amount of time with customers consulting in resolving these challenges.

 

Even more striking is how functionality from as far back as the 1960s is often reinvented in today's technology with no real advancement, marketed as if it were something entirely new.

 

For example, Databricks announced on July 24, 2024  that the primary key and foreign key functionality was going GA so that queries could run faster. Isn’t that what we were doing with databases in the 1970’s and 80’s? This is hailed has a new feature, but it’s nothing new. We get nothing more than we already had.

 

This reinvention of the wheel begs the question: where is the innovation? Why don’t we demand more?



Changing the Game with Portability.

 

I'm here, with my talented team, to change this story with Data Tiles and our solution, Latttice. And yes, I know, I am a vendor, but our teams’ hearts are in the right place. Doing the right thing by the customer. In Australia, we call this the ‘pub test’, applying common sense to our decision making.

 

The ability to easily move and access your data across different platforms is crucial for maintaining flexibility, reducing costs, and ensuring you can adapt to the best technologies available. This is portability which is broken down into the three areas of storage, compute, and access:

 

  • Storage. Think of data storage like a digital suitcase that holds all your files, pictures, and documents. Just like you want a suitcase that you can easily take with you wherever you go, you want your data to be stored in a way that can be easily moved or accessed across different locations. Portability in storage means you’re not locked into one provider or system. If another provider offers a better deal or improved features, you can move your data there without hassle. This flexibility helps avoid vendor lock-in and allows you to adapt to changing needs.

 

  • Compute. Imagine compute power as the engine that processes your data. You want an engine that you can easily swap between different vehicles (platforms) to find the best performance and cost efficiency. Being able to move your computing tasks to different platforms or services means you can choose the most cost-effective and efficient option for processing your data. This is crucial for managing costs and optimising performance.

 

  • Access. Access is like having a universal key that lets you open the door to your data from anywhere. You want to be able to access your data no matter where it is stored or processed. Portability of access ensures that you can retrieve and work with your data seamlessly across different environments, using your tools of choice, making collaboration easier and more efficient.

 

At Data Tiles, our no-code/low-code, AI powered data mesh platform, Latttice, is designed to address these challenges of complexity, time and cost. How?  Let’s see:


  • Complexity. Remember I mentioned technology and talent? We solve the technology challenge by providing a control plane over all your data, regardless of the technology, now and in the future. This allows you to use the right tool for the right job. You don’t have to centralize your data to get value. You can use the assets you have now and modernize. With talent, our zero code, AI powered solution puts the power in the hands of everyone. No need to understand the technical challenges of finding and using trusted data, allowing even non-technical users to create data products without writing a single line of code.

 

  • Time. As I mentioned, speed to market is critical. We can no longer accept the data permafrost as the blocker to moving our organizations forward. The Data Tiles technology can be deployed in minutes and within those minutes you can be asking questions of your data. No more waiting months to hopefully get the data capability you need to do what you need to do.

 

  • Cost. I mentioned an acceptable cost. By applying the principles of price performant compute (PPC), which I discussed in depth in a precious blog, Drastically Reduce Your Modern Data Platform Costs with Price Performant Compute (PPC), organizations can reduce the cost of their data stack by over 89%. More importantly, organizations can do more with less. This is outlined in a recent McKinsey study, which highlights that by treating data as a product, subsequent use cases are 90% faster to develop. Imagine if you had that capability.

 

By addressing the persistent challenges of complexity, time, and cost, we aim to provide organizations with seamless access to their data, enabling them to make better decisions.

 

I am absolutely committed to revolutionizing data access with Latttice. Today, we draw a definitive line in the sand: Latttice democratizes data access! We empower our adopters to create their data products with unprecedented ease. Now, I lay down the gauntlet and implore others to tackle solving further complex challenges that businesses face in their quest to become truly data-driven organizations.

 

 

 

 

Comments


bottom of page