Join our Discord Server
Tanvir Kour Tanvir Kour is a passionate technical blogger and open source enthusiast. She is a graduate in Computer Science and Engineering and has 4 years of experience in providing IT solutions. She is well-versed with Linux, Docker and Cloud-Native application. You can connect to her via Twitter https://x.com/tanvirkour

The Data Analytics Problems That AI Can’t Fix 

4 min read

The last several years have been a time of both opportunity and pressure for data teams. The tools available to collect, store, explore and analyze data have grown steadily more powerful and effective, but expectations from line-of-business stakeholders have more than kept pace. 

Business leadership personnel expect real-time information, user-friendly visualizations, and actionable insights that guide them to make better business decisions. Data science teams are struggling to keep up with the pace of their demands, especially as data volumes keep rising. 

For many, the arrival of AI-powered business intelligence tools seemed like the answer to their prayers. “GenBI,” as it’s been nicknamed, allows line-of-business users to ask natural language queries and receive meaningful answers. 

By democratizing access to data, non-experts can independently generate the insights they need, and data analytics professionals are free to focus on more strategic work instead of just fielding tickets. At the same time, GenBI and LLM-powered tools can automate many basic and tedious data processing tasks, empowering data science teams to achieve more complex activities. 

However, AI is hardly a magic bullet. While its capabilities are certainly impressive and useful, there are still things that it can’t do.

Speaking about the rise of AI in data analytics on the DATAcated podcast, Pyramid Analytics CTO Avi Perez said that “You’ll be shocked when I tell you that the same data problems exist – it’s just at a different scale. We still have the problems of ‘garbage in, garbage out.’ We still have the headache of, ‘Is the data correct? I’ve got a copy of the data, you’ve got a copy of the data.’”

Will improvements in AI make these problems go away over time? Perez doesn’t think so. “The software and the industry has moved on a huge amount, and this is where the GenAI frameworks are going to move it even further,” he continued. “But I think the problems have metastasized, and they’re going to keep doing so, because we’re going to collect even more data in the future around everything we’re already doing.” 

It seems that even the best AI system needs human copilots to deliver useful data insights. Many are discovering that they have underlying data problems which don’t suddenly go away once they invest in AI analytics. Let’s take a closer look at the data problems that AI can’t fix and analytics pros mustn’t ignore. 

When Data Is Not Homogenized Effectively

Silos are a persistent problem plaguing data analytics. Now that AI systems can gather data from so many systems, the issue is becoming more serious, not less. Many different departments and teams collect data from their own sources, and then store it in individualized repositories and locations. 

The problem doesn’t disappear when organizations force all the data to be fed into a single data lake. Unless data is homogenized effectively, it won’t suddenly merge into one organized, cataloged body of data. 

Varying formats mean that data might reside in the same lake, but it exists in separate pools. Only hard human work can synchronize these datasets, not an AI model. 

When the Data Is Low Quality 

Using an advanced analytics engine doesn’t obviate the reality of “garbage in, garbage out.” On a basic level, more data should mean better insights, because the model can detect patterns more effectively. However, that’s only true when that data is reliable. As data has grown in both volume and complexity, it’s also become more susceptible to being corrupted, misunderstood, and/or poorly defined. 

What’s more, the hunger for more data, coupled with the need to avoid copyright infringement from using human data, has created interest in the use of synthetic data, but over-reliance on this workaround can cause its own problems. Scholars at Oxford University recently found that training models on synthetic data can cause their downfall over time.

“Model collapse is a degenerative process affecting generations of learned generative models, in which the data they generate end up polluting the training set of the next generation,” wrote the researchers. “Being trained on polluted data, they then mis-perceive reality.”

When Data Enrichment Is Inconsistent 

Even when you’ve ensured that all departments are working with the same data, you still need to overcome concerns about inconsistent data enrichment. Different departments use the same data for different purposes, resulting in varying versions of the truth. 

For example, finance, marketing and compliance might all use the same source dataset, but when enriching this information, they will focus on different aspects, apply different weights to the data, label things differently, and use different enrichment sources. 

This means they each end up with slightly different datasets, so their data is no longer consistent. If there’s no longer a single source of truth for the entire organization, AI-powered analysis will deliver inconsistent and unreliable results.

When Data Governance Is Weak

Teams need robust governance policies for data collection, management, storage, ownership and stewardship. It’s the only way to produce trustworthy, accessible data that has been processed consistently and cleaned correctly. But while AI can do many things, it can’t enforce compliance with good governance. 

When governance is weak, it can pollute data integrity and undermine trust in data. This is even more serious for AI analytics than traditional analytics. AI models rely on data to train and validate models, so minor errors in an initial dataset can be magnified as they are copied and recopied. 

“Data governance is important for all data operations and analytics, strategy, and decision-making,” noted Faye Murray, the Chief Data Officer of Emrys. “However, AI is especially sensitive to poorly governed data, and without addressing data governance, the potential offered by AI will remain mostly untapped.” 

When Security Processes Kill Data Access 

Possessing the data is only part of the battle. You still need all your line-of-business stakeholders to actually use the data that you’ve collected and processed. Unfortunately, inefficient security policies can cut off data access from the people who need it most. 

In the enterprise, security processes can be long and complex. Only a few stakeholders might receive permission to access a given sensitive dataset, which means that the vast majority aren’t using the data they need. 

With data increasingly being recognized as a valuable asset, security has grown tighter, making access more difficult for regular users. When decision-makers can’t obtain the data they need, they’ll either end up with flawed insights, or they won’t use their shiny AI-powered insight engines at all. This leaves data science teams back where they started – answering a long list of queries from business users. 

AI Analytics Isn’t Enough 

AI analytics can be a fantastic tool for business decision-making, but there’s a limit to its efficacy. Without coupling these solutions with a robust and comprehensive data management strategy, we end up with confusion, misdirection, and inconsistent results.

Have Queries? Join https://launchpass.com/collabnix

Tanvir Kour Tanvir Kour is a passionate technical blogger and open source enthusiast. She is a graduate in Computer Science and Engineering and has 4 years of experience in providing IT solutions. She is well-versed with Linux, Docker and Cloud-Native application. You can connect to her via Twitter https://x.com/tanvirkour
Join our Discord Server
Index