In the previous section, we explored a theoretical framework for categorizing arguments related to data quality, providing a foundational understanding of the various perspectives in this discussion. With this broader perspective, we will now delve into the practical aspects of data quality, focusing on what is most relevant and how we can achieve it
The Empirical Approach
Richard Wang and Diane Strong conducted a very interesting piece of research in the 1990’s. In the first step, they asked data consumers to list all attributes that come to their mind when thinking about data quality. In the second step, these attributes were ranked by importance. A factor analysis consolidated the initial 179 attributes to a smaller set of data quality dimensions in four major categories.
Intrinsic Data Quality
Intrinsic Data Quality includes “Accuracy” and “Objectivity”, meaning the data needs to be correct and without partiality. While these two dimensions seem to be pretty self-explanatory, “Believability” and “Reputation” are not so obvious. It’s quite interesting that they are not about the data itself but they refer to the source of data, either the respondents or the fieldwork provider: respondents need to be real and authentic, while the fieldwork provider should be trustworthy and serious.
Contextual Data Quality
Contextual Data Quality means, that some aspects of data quality can only be assessed in the light of the corresponding task at hand. As this context can vary a lot, attaining a high contextual data quality is not always easy. Most of the contextual dimensions (Value-added, Relevancy, Timeliness, Completeness, Appropriate amount of data) require thorough planning before setting up and conducting the research. Conversely, it is really hard to improve contextual data quality once it has been collected (e.g. reminders to improve completeness).
Representational Data Quality
Representational data quality refers to the way, data is formatted (concise and consistent) and the degree to which you can derive meaning from it (interpretability and ease of understanding). Simply imagine the data validation routines for an online survey. When asking for the respondents’ age for example, you would make sure everyone (consistently) enters the age in whole years (concisely) or even within the age groups you’re particularly interested in (ease of understanding). In any case, the respondent will be hindered from submitting erroneous or extreme values (interpretability).
Accessibility Data Quality
The two dimension within this category can be opposed, and, therefore, require a good balance. Accessibility is about how easy and effortless data can be retrieved, while Access Security is about how the access can be limited and controlled. These aspects have received an increasing attention during the last years – e.g. online dashboards or data warehouses.