Last updated: February 14, 2024
Categories of data quality checks
Read this guide to learn how the types of popular data quality checks are divided into categories, and what categories of checks are supported by DQOps.
Categories of data quality checks
Data quality checks in DQOps are divided into categories of similar checks that detect the same group of data quality issues. The following table shows the most common data quality issues that are detected by DQOps, divided into categories.
Data quality problem | YAML node name | Description of the data quality problem | Check reference |
---|---|---|---|
Accepted values | accepted_values |
Verify that text or numeric columns contain only well-known values from a set of accepted values. | column-level |
Anomaly detection | anomaly |
Monitor anomalies in numeric values by applying a time-series analysis. Get notified when new values appear to differ from regular values. | column-level |
Blanks and whitespaces | whitespace |
Analyze text columns to detect values that should be null, but instead a programming language specific text is stored in the column: '', ' ', 'undefined', 'None', 'n/a', etc. | column-level |
Boolean values | bool |
Measure the percentage of true and false values in the dataset. | column-level |
Comparing tables | comparisons |
Compare (reconcile) data across different data sources. | table-level column-level |
Custom SQL and multi column checks |
custom_sql |
Validate data using custom SQL expressions, also comparing multiple columns in the same table. | table-level column-level |
Data accuracy | accuracy |
Compare the tested table with another (reference) table by joining tables in the same database. On a column level, compare the sum, min, max, not null count of values. | table-level column-level |
Data type conversions | conversions |
Analyze text columns and verify if the values can be parsed and converted to numeric, boolean, float and date types. | column-level |
Data type detection | datatype |
Analyze text columns that are supposed to store only numeric, boolean, date, timestamp, or text values. Detect if values of a different data type are found instead. | column-level |
Data volume | volume |
Monitor the volume (row count) in tables and partitions. Get notified when unexpected changes in the volume is detected. | table-level |
Empty or incomplete columns | nulls |
Detect null values in columns, or detect columns that are empty. Also detect anomalies in the percentage of null values. | column-level |
Invalid dates | datetime |
Detect common data quality issues in date or datetime columns, such as dates in the future our dates out of a valid range. | column-level |
Numeric statistics | numeric |
Analyze numeric columns to detect if the values are in the expected ranges. | column-level |
PII values | pii |
Find PII values (Personally Identifiable Information) in text columns. | column-level |
Referential integrity | integrity |
Detect data integrity issues, such as missing rows when doing a lookup by a primary key. | column-level |
Table availability | availability |
Monitor whether the table is accessible and available for use. Detect problems with the data source availability and physical reliability. | table-level |
Table schema drifts | schema |
Detect table schema changes and drifts. DQOps detects also changes to the list of columns, order of columns, and types of columns. | table-level column-level |
Text patterns | patterns |
Validate text columns with built-in patterns such as an email address, or using custom regular expressions. | column-level |
Text statistics | text |
Analyze text columns to detect values shorter or longer than the maximum accepted length. Also verify if text values can be safely conveerted to numbers, booleans, dates, etc. | column-level |
Timeliness and freshness | timeliness |
Measure the freshness and staleness of tables. Detect outdated tables with no new records. | table-level |
Uniqueness and duplicates | uniqueness |
Detect duplicate values in columns. Ensure that values in columns are unique. | column-level |
Check categories in the user interface
The data quality check editor shows data quality checks categories as collapsible sections.
Table-level check editor
The categories of table-level checks are shown when opening the data quality check editor for a table.
Column-level check editor
The categories of column-level checks are shown when opening the data quality check editor for a column.
Referencing check categories in YAML files
If you want to configure data quality checks directly in the DQOps YAML files, the categories are defined as YAML nodes below respective elements for types of data quality checks.
The following example of a .dqotable.yaml shows the category nodes highlighted.
What's next
- Learn how to run data quality checks for one category of checks.
- Review the reference of data quality checks supported by DQOps, that shows SQL queries generated for each data source, and YAML code samples for activating the checks.