When it comes to analytics, enterprises today have a surplus of data but a shortage of insights.
Why the data surplus? Data warehouses, data lakes, and other repositories are brimming as volume, variety, and velocity continue to grow. Even as the tide continues to rise, enterprises are tapping into new data sources, such as social media and Internet of Things sensors, in order to gain new analytics opportunities.
This explosion of information has made it difficult for some IT teams to generate the necessary insights thanks to bottlenecks like cumbersome manual coding for Extract, Transform, and Load (ETL) processes, a lack of understanding about how data is being used and, more importantly, how it can be used. The result: dark data.
Dark data is collected and stored as part of typical business activities, but it’s not used for anything other than compliance and retention purposes.
The average enterprise analyzes just 37 percent of its structured data and 22 percent of its semi-structured and unstructured data.
And that dark data matters for two reasons:
- This data costs money to capture and manage, and it often necessitates capacity upgrades for premium data warehouses.
- Dark data can hold latent analytics insights that enterprises are failing to realize.
Today’s enterprises seek both to analyze more of their data and to reduce the costs related to their unanalyzed dark data. Here’s how enterprises can begin to achieve both goals.
- Analyze more data. The value of a data point often boils down to its correlation with other data points. Decision makers can better understand their financial standing, for example, by reviewing not just revenue by country, but revenue by customer and by sales rep within those countries, along with product mix and overall averages for each. This type of structured data typically resides in data warehouses. The data points are easily correlated for insights. Other data, such as unstructured and semi-structured data, might sit dark because it is not as easy to correlate and analyze. For example, records about customer-service interactions might be dark. But if the Business Intelligence or data science team applies new semantic analysis to those records and correlates the records with external social media trends in a flexible platform such as Hadoop, they can extract new insights. This means, for example, they can make better decisions about customer-service policies and up-selling opportunities. Efforts like this can bring dark data into the light.
- Reconsider data storage architectures with an eye toward cost savings. Not all data holds immediate value. Old customer records or operational reports often grow dusty but still consume space in premium data warehouses in order to satisfy regulatory retention requirements. These files can reside far more cost-effectively in Hadoop or the cloud.
We find that enterprises can best reduce the amount and cost of dark data by adopting three basic best practices.
- Automate. IT organizations can lose valuable time and energy to manual, error-prone ETL processes. Replacing this drudgery with intuitive, automated software enables IT to deliver more analytics-ready data to the business faster. Zurich Insurance, for example, has used data warehouse automation solutions to reduce ETL coding time from 45 days to two, and to accelerate EDW updates from twice annually to a monthly pace. As a result, the company has freed up resources for analytics and has lit up more of its dark data.
- Try new technologies and platforms. Apache Spark and Apache Kafka are just two emerging methods for analyzing and acting upon data streams in real time. Kafka, for example, can stream real-time transaction updates from customer databases to big data platforms such as Hadoop, where those transactions can be correlated with individual smartphones and physical store sensors to make location-based retail offers to repeat customers. Without the Kafka real-time feed, that transaction update might have become dark data. Instead, it creates a cross-selling opportunity.
- Track data usage. Enterprises across industries can realize significant savings by identifying unused tables and databases in their data warehouses and re-balancing them to economical platforms such as Hadoop or the cloud. This frees up premium data warehouse resources, improves query performance, and postpones costly hardware upgrades.
IT organizations understand that dark data is unused data, and that unused data assets are liabilities. By extracting new value from once-dark data and reducing management costs associated with dark data, they can improve the economics of their analytics initiatives.