Sanjeev Mohan is principal analyst at SanjMo. Contact him LinkedIn.
Data mesh is exciting because it evolves our thinking so that older approaches that may not have worked in practice can work today. The biggest change is how we think about data: as a product to be managed with users and their desired outcomes in mind. Organizations want to adopt product management practices to make their data assets consumable. The purpose of a data product is to make greater use of ‘trusted data’ by facilitating its analysis by a diverse group of consumers. This, in turn, enhances an organization’s ability to quickly extract intelligence and insights from their data assets with little friction.
The data management space has steadily adopted well-tested software development lifecycle methodologies such as DevOps and observability. Now the focus has shifted from applying agile development practices and product management to data and analytics.
What is a data product?
You can think of a data product as a standalone data container that directly solves a business problem or that makes money. They are built for internal or external users at different levels of maturity, and some practical examples include:
• A good old one table or a vision with a published data model, such as a star schema or a business-friendly semantic layer. An example is a denormalized (flattened) table or materialized view that aggregates employee data from different data sources, such as HR, learning management, and survey Excel files.
• A report, dashboard or a application with its own user interface (UI), an API, or SQL command line access. An example is a customer 360 dashboard that unifies sales, marketing, and service data.
• A ML model or a metric that can be embedded in users’ workflows. For example, a model to predict customer churn or sentiment analysis. It may be available as a user-defined feature for easy use by citizen data scientists or partners outside the organization.
How do data products differ?
You might be thinking, what’s the problem and what’s new? Isn’t this what we’ve been doing for a long time?
What makes data products unique is that they focus on the people and process side. In the past, our work was done when we made and supplied the technical parts mentioned above. Now, however, we focus on the entire data lifecycle – from requirements to creation, use, and finally to end-of-life. This requires a different way of thinking, where we prioritize business use over technology. Essentially we bring”product thinking” to data.
What are some of the key features of data products?
If we want to treat data as a product, then we need to create a data team led by a data product owner. The team should consist of analysts, data (or analytics) engineers, user experience designers, and architects who would develop data products to meet the following characteristics:
One goal of data products should be reusability. For example, if an organization has invested to develop a multifunctional customer 360 data product, then it must be leveraged by different departments. For this to happen, products must be stored in a registry with an adequate metadata description so that users can easily search.
Data catalogs have been used to link technical and business metadata, while providing capabilities such as lineage and integration with data quality, security, and BI tools. Since data catalogs provide a single window for discovering data, they must also be extended with data products.
There is no greater death wish for the adoption of data products than the loss of confidence in the veracity of the information. Since a data product collects data from various sources to provide added value, domain-driven decentralized data quality rising as a major consideration for data products.
The data team must invest in modern data quality approaches to detect and resolve anomalies before producing data products. Data quality should be treated as a business initiative with the primary focus on context, rather than technical dimensions.
The adoption of self-service analytics requires security in two dimensions: dynamic access and authorization to only the right people, and ensuring compliance with data privacy standards such as HIPAA and GDPR for sensitive, personally identifiable information (PII).
The principles I described in a previous article on data security modernization also apply to data products. Data security products control access and allow different consumers to see different results from the same data product, as they enforce specific security policies to protect sensitive data and comply with data sovereignty laws.
Unlike software applications, data is constantly changing. These changes come from various sources and SaaS applications used to build the data products without warning. These “deviations” can relate to schedule changes, late and out-of-order data, or data entry errors. In addition, there can be failures in the pipelines and infrastructure that can cause some tasks to fail and go undetected for a long time.
Therefore, it can be beneficial to invest in data observation tools. Their capabilities can include automated and proactive anomaly discovery, root cause analysis, monitoring, notifications, and recommendations to resolve anomalies. The end result is higher reliability of data products and faster recovery from errors.
Good data skills are hard to come by and architectures are becoming more complex. Mature organizations should use a factory-style assembly line to build and deploy data products to increase decision-making flexibility.
DataOps has evolved as the necessary capability to deliver efficient, flexible data engineering. Its many features include automation, low/no-code development, continuous integration, testing, and deployment. The end goal of DataOps tools should be to accelerate the development of reliable data products.
These key features should help organizations begin their journey of developing data products. I’ve found that the companies leading the way in this space measure the effectiveness of their data products through increased usage of their data, which translates into improved data-driven decisions. Some organizations also successfully monetize their data products.