8 Feb 2023
Democratization of data
Part 1: Democratization of data requires a common language and shared definitions
Deep within the labyrinthine corridors of a sprawling corporation, a battle was being waged. On one side, the Chief Data Officer, Sarah, armed with her knowledge of data governance and her unyielding determination to bring order to the chaos of conflicting data definitions. On the other, the departmental silos, each guarding their own unique terminology and metrics.
As Sarah delved deeper into the problem, she realized that the root of the issue was a lack of a common language for data and information. Without a common reference library, collaboration between teams was nearly impossible and trust in the data was non-existent.
Determined to crack the code and bring unity to the organization, Sarah thought back to her university studies and remembered the Rosetta Stone. What if she could create a Rosetta Stone for her organization’s data, providing a clear and consistent meaning for all terms and metrics – in the language of each department.
But the true test of her solution came when she presented it to the departmental leaders, the guardians of the silos. Would they accept this outsider’s attempt to impose a new order on their cherished data?
The meeting was tense, each departmental representative fiercely defending their own terminology and methods. But as Sarah presented her findings, a sense of understanding began to dawn on the group. With a common language, collaboration between teams would improve, data-driven decisions would be more accurate and the company’s bottom line would receive a much-needed boost.
Fast-forward to the end, the departmental leaders were won over and the silos began to crumble. The organization was now united in its data management efforts, and the results spoke for themselves: increased revenue and improved efficiency.
But Sarah knew that the true victory was not in the numbers, but in the unity that had been achieved. She had cracked the code and brought order to the chaos. And just like the symbologist in one of her favorite novels, she had uncovered the hidden meaning in the data and revealed a path to a brighter future for the organization.
The Dan Brownesque narrative might not be exactly what you face in your daily work, but the facts are clear, your data scientists spend at least 60% of their time cleaning and organizing data, according to a survey by CrowdFlower published in Forbes.1 Furthermore, a McKinsey survey2 found that more than half of an analytics team’s time, including that of high-earning data scientists, is often spent on data processing and cleansing, hindering scalability and causing frustration among employees. This can negatively impact the productivity of individuals throughout the organization, where participants stated that approximately 30% of their overall enterprise time was wasted on unproductive tasks due to inadequate data quality and accessibility.
Where does it hurt?
Change initiatives aren’t performing as well as they could if data and information were easily available and understood. This is not solely a data issue, but the Democratization of Data is hindered by People, Process and Technology today. This is a paradigm that must change to accelerate Digital Transformation.
In some organizations, People guard their department’s data to ensure they maintain a knowledge edge against other departments, this is even more common in times of turmoil and cost cutting – exactly the time when an organization needs to work together.
Technology and software aren’t optimized to share data, it is optimized to perform a task or a workflow. Thus, sharing of data isn’t the key aspect when most software is developed. To solve this issue, organizations bring in additional technology such as Data Lakes and Data Warehouses which help store data but have difficulties in providing a human centric layer to interact with3.
The democratization of data is a powerful tool that can help organizations make better decisions, improve efficiency, and drive growth. However, for data to be truly valuable, it must be accessible to all members of an organization, regardless of their role or level within the organization. This is where the concept of a common language and shared definitions becomes critical.
Without a common language for data and information, different departments, teams, and individuals may have different interpretations of the same data. This can lead to confusion, misunderstandings, and ultimately, a lack of trust in the data. For example, if the sales team is using one set of metrics to measure performance while the finance team is using another, it will be difficult for them to collaborate and make data-driven decisions.
A common language for data and information is essential for democratization to succeed. It ensures that everyone within an organization is speaking the same language when it comes to data and information.
This is particularly important when it comes to data governance and management. Without a common language, it becomes difficult for organizations to establish and maintain data governance policies, procedures, and standards.
So, a Rosetta stone?
A Rosetta stone for data is not just synonyms and translations between. But it is also the structure of data and information, how does one relate to another.
Let’s establish some terms to be able to elaborate on what a Rosetta Stone for data would mean:
Click on image to enlarge
In short, a Rosetta stone for data is a comprehensive system that would enable the organization to have a single source of truth and a common understanding of data across all systems, departments, and teams. It’s a way to ensure that everyone in the organization is speaking the same “language” when it comes to data, which makes it easier to share, analyze, and use data effectively.
The Rosetta stone approach needs Data together with its context (Information), structured and stored in a way that makes it easy to understand and use (the information model) built on a common reference model which is an information model used across many systems. This common reference model will be referred to as the Operational Reference Model from now on.
Part 2: Sounds great, now what?
How do organizations establish a common language for data and information?
So, how do organizations establish a common language for data and information? One way to start is to build the first part of your Operational Reference Model that defines the types of data and the relationships between them. This will then provide you with the foundation for establishing your governance framework. The governance framework should include a set of policies, procedures, and standards that define how data is collected, stored, and shared.
Click on image to enlarge
Never try to establish a full common data model from start, always start with a burning platform
As you build it out, it will become your organization’s single source of truth for data and information, and enables users to discover, understand, and use data.
What about definitions?
In addition to a common language, shared definitions are also essential for democratizing data. Shared definitions ensure that everyone within an organization understands the meaning and context of the data. They also ensure that data is used consistently and in a reliable manner.
Definitions can be stored in either a data catalog or a data dictionary which is using the Operational Reference Model to define the structure and relations between data. The data catalog should include data definitions, data lineage, data quality scores, data usage policies, and data access control policies. The data dictionary is a central repository of all data definitions within an organization. It enables users to understand the meaning and context of the data, and to use it in a consistent and reliable manner.
How did we get here in the first place? Didn’t we think this through?
The reasons are:
- Technical debt arises from the use of different specialized business languages by various departments, making data sharing difficult.
- Departments use data models and databases to capture and lock in their own business language, leading to multiple, disparate department-level databases that don’t communicate well with each other.
- Accommodations and workarounds are made to connect the disparate systems, adding complexity, and contributing to the growth of technical debt.
To resolve this, you need to assemble a team of professionals to start working on this. Typically, you would create a group consisting of4:
- Senior members of your organization who understand your business and how things are connected in the enterprise.
- Skilled communicators who can easily communicate the concepts developed.
- Change leaders who can lead and drive change and adoption in the organization.
- Architects and technicians who can instantiate the resulting language into systems and the overall architecture.
The challenges organizations face with conflicting data definitions and a lack of a common language for data and information are bigger than what most Organizations would like to admit. We’ve introduced the concept of an Operational Reference Model which is the foundation for a “Rosetta stone for data,” i.e., a comprehensive system that would enable an organization to have a single source of truth and a common understanding of data across all systems, departments, and teams. The Rosetta stone approach includes data, information, an information model, and an Operational Reference Model. To establish a common language for data and information, we recommend building the first part of an Operational Reference Model that defines the types of data and the relationships between them, and using this as the foundation for establishing a governance framework that includes policies, procedures, and standards for data collection, storage, and sharing. The article also highlights the importance of shared definitions for data and suggests using a data catalog or data dictionary to store them.
In conclusion, the democratization of data requires a common language and shared definitions built on an Operational Reference Model. These concepts are essential for ensuring that everyone within an organization can understand and use data, regardless of their role or level within the organization. These tools and practices can help organizations establish a common language, shared definitions, and a single source of truth for data and information, which will ultimately lead to better data-driven decision making, improved efficiency and drive growth.
/Daniel Lundin, Head of Product & Services