What is the difference between a data owner and a data steward?

By: Bart Baesens, Seppe vanden Broucke

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.

You asked: What is the difference between a data owner and a data steward?

Our answer:

Every data field in every database in the organization should be owned by a data owner, who is in the authority to ultimately decide on the access to, and usage of, the data.  The data owner could be the original producer of the data, one of its consumers, or a third party.  The data owner should be able to fill in or update its value which implies that the data owner has knowledge about the meaning of the field and has access to the current correct value (e.g. by contacting a customer, by looking into a file, etc.).  Data owners can be requested by data stewards (see below) to check or complete the value of a field, as such correcting a data quality issue.

Data stewards are the DQ experts in charge of ensuring the quality of both the actual business data and the corresponding metadata.  They assess DQ by performing extensive and regular data quality checks.  These checks involve, amongst other evaluation steps, the application or calculation of data quality indicators and metrics for the most relevant DQ dimensions.  Clearly, they are also in charge of taking initiative and to further act upon the results of these assessments.  A first type of action to be taken is the application of corrective measures.  However, data stewards are not in charge of correcting data themselves, as this is typically the responsibility of the data owner.  The second type of action to be taken upon the results of the data quality assessment involves a deeper investigation into the root causes of the data quality issues that were detected.  Understanding these causes may allow designing preventive measures that aim at eradicating data quality problems.  Preventive measures may include modifications to the operational information systems where the data originate from (e.g., making fields mandatory, providing drop-down lists of possible values, rationalizing the interface, etc.).  Also, values entered into the system may immediately be checked for validity against predefined integrity rules and the user may be requested to correct the data if these rules are violated.  For instance, a corporate tax portal may require employees to be identified based upon their social security number, which can be checked in real-time by contacting the social security number database.  Implementing such preventive measures obviously requires the close involvement of the IT department in charge of the application.  Overall, preventing erroneous data from entering the system is often more cost-efficient than correcting errors afterwards.  However, care should be taken not to slow down critical processes because of non-essential data quality issues in the input data.