PNSQC April Meetup - Measuring Data Quality and Data Stewardship
Measuring Data Quality and Data Stewardship
Storage is cheap, processing is flexible, and data is a hidden trove of value increasingly used to drive business decisions and priorities. We generate it by the terabyte. We copy it from database to database, transform it from NoSQL to Relational data to graphs and charts. We ingest it from customers and reflect it back to them with augmentations and additions our Sales departments have promised are just what is needed to drive them to the next level. Yet whether it’s an unstructured data lake or a decades-old dusty schema everyone eventually comes to a point where they realize that a lot of this data is possibly wrong, missing, or maybe just useless junk.
How do you measure the quality of your data? What are the actual metrics that can be used to measure data quality to ensure confidence in your decision-making process? What does data quality even mean?
Data Quality is a comparison of the actual state of a particular set of data to a desired state.
For data quality to be measured you need a standard of comparison. Within a given data set you can compare that data with itself or statistically with similar data sets. Between copies of data you can compare the source data to the target data while compensating for business rules that transform or selectively filter data between the two.
Each of those methods has a number of data quality measurements that can be used as components of total data quality such as:
Consistency
Completeness
Validity
Exactness
Aptronymity
Uniqueness
Further we can elaborate on practices and cultures that encourage data quality or that enable poor quality to sneak in. Data Stewardship is a great start for creating an ownership of data and connection with its purpose and uses. Data standards enable development teams to build data quality in from the get go with practices like rigorous input validation or periodic data surveys.
If your business and customers depend on data you need date you can depend on.
Nick Bonnichsen is a Software QA Engineer with over 20+ years of experience, some of it actually worthwhile. He has focused on data and data centric testing for the last 8 years or so having worked in various parts of testing and deciding to stay the heck away from UIs whenever possible. In his free time Nick enjoys live music, overly complicated video games, poor attempts at woodworking, international travel, and the occasion backpacking trip.
This will be a hybrid meetup - both in person in Portland and on zoom. PNSQC will provide some drinks and food for those who can attend in person.