Skip to content

Last updated: April 09, 2024

Data quality monitoring checks for data observability

This guide shows how the data quality monitoring checks in DQOps are observing the data sources, and tracking the data quality with data quality KPIs.

What are data monitoring checks?

The monitoring checks in DQOps are responsible for continuous monitoring the data quality of data sources. The data quality results generated by monitoring checks capture an end-of-day or an end-of-month data quality status of monitored data.

Capturing an end-of-day (or end-of-month) status of the last execution of a data quality check is important for:

  • storing an audit log of executed data quality checks, especially when auditing is required for regulatory reasons
  • measuring data quality KPIs to prove the trustfulness of data sources
  • tracking the data quality improvement day-by-day, and presenting the progress of data cleansing projects to stakeholders and business sponsors of the data quality initiative

Before activating a data quality monitoring check, you should test a profiling version of the data quality check. Every monitoring and partition data quality check has a profiling version, named as profiling_*.

Time scale

Monitoring checks are divided into two groups, having almost the same data quality checks.

  • daily monitoring checks track the end-of-day data quality status
  • monthly monitoring checks track the end-of-month data quality status, but are not supporting anomaly detection checks because one data quality result per month is not enough to use prediction

Summary

The following table summarizes the key concepts of monitoring data quality checks in DQOps, divided by daily monitoring and monthly monitoring checks.

Check type Time scale Purpose Time period truncation Check name prefix
monitoring daily The preferred type of checks to detect data quality issues.
Daily monitoring checks store the end-of-day data quality status for measuring the data quality KPIs.
One data quality monitoring result captured per day,
when a daily monitoring check is run again during the same day, the previous result is replaced.
daily_*
monitoring monthly Capture the last known end-of-month data quality status.
Monthly monitoring checks store the end-of-month data quality status for measuring the data quality KPIs.
One data quality monitoring result captured per month,
when a monthly monitoring check is run again during the same month, the previous result is replaced.
monthly_*

Monitoring checks in DQOps user interface

Daily monitoring checks

The following screen shows the data quality results of the daily_row_count data quality check that measures the number of rows in a table using a SELECT COUNT(*) FROM <monitored_table> SQL query.

The data quality check error severity rule has a parameter min_count: 1, which would raise an error severity issue if the table is empty. The other threshold of the warning severity rule verifies if the table has at least 500.000 rows, raising a warning severity issue when the table is smaller. The data quality check details panel on the check editor shows that all recent data quality check runs failed with a warning severity issue, because the table had less than 500.000 rows for the last 11 days when the data quality check was run. The highest detected row count was 488.478 rows.

The Executed At column shows the time when the data quality check was run, and the Checkpoint date column shows the value of the time_period value from the check_results Parquet table used by DQOps to store the data quality results.

Because daily monitoring checks store the end-of-day status (and only one result per check and day of running), the values of the Checkpoint date (time_period parquet column) are truncated to the beginning of the day when the check was run.

daily monitoring data quality checks editor

The same data quality results are also shown on the chart view. All captured data quality metrics are presented as an Actual value time series on the chart, called the data quality sensor readouts in DQOps. Because all recent row counts were below the minimum required 500.000 rows, all the results are shown within the yellow zone for warning severity data quality issues.

daily monitoring of data volume

Monthly monitoring checks

The monthly monitoring checks store the end-of-month data quality status, replacing previously captured results. The following screen shows the result of running the monthly_row_count at 2020-01-20 16:56:17 (January 20th, 2024).

DQOps stored one result for January 2024, truncating the value of the Checkpoint date (time_period parquet column) to the beginning of the month.

monthly monitoring data quality checks editor

Monitoring checks pros and cons

When to use monitoring checks

Use the data monitoring checks to:

history of data quality KPI per day an data quality dimension

You can run monitoring checks multiple times during the day

It is safe to run monitoring checks every time when new data is loaded into a monitored table, even multiple times during the day. DQOps will replace the last known data quality result during the day or month in respectively daily monitoring checks and monthly monitoring checks.

Limitations of monitoring checks

The results of monitoring data quality checks are used to evaluate the data quality KPI and compliance with data contracts.

  • Do not use monitoring checks for the first time before experimenting with a profiling variant of that check. The configuration of accepted profiling checks can be easily converted to monitoring checks. If a misconfigured monitoring check is run and fails, raising a data quality issues, the issues will decrease the data quality KPI score. You will have to use the delete data quality results screens to remove these data quality results.

  • Monthly monitoring checks do not support anomaly detection data quality checks, because when only one data quality result for each data quality check is stored per month, there is not enough historical data to use prediction.

Monitoring check configuration in DQOps YAML files

The configuration of active data quality monitoring checks is stored in the .dqotable.yaml files. Please review the samples in the configuring table metadata article to learn more.

What's next