Skip to content

Last updated: February 14, 2024

Categories of data quality checks

Read this guide to learn how the types of popular data quality checks are divided into categories, and what categories of checks are supported by DQOps.

Categories of data quality checks

Data quality checks in DQOps are divided into categories of similar checks that detect the same group of data quality issues. The following table shows the most common data quality issues that are detected by DQOps, divided into categories.

Data quality problem YAML node name Description of the data quality problem Check reference
Accepted values accepted_values Verify that text or numeric columns contain only well-known values from a set of accepted values. column-level
Anomaly detection anomaly Monitor anomalies in numeric values by applying a time-series analysis. Get notified when new values appear to differ from regular values. column-level
Blanks and whitespaces whitespace Analyze text columns to detect values that should be null, but instead a programming language specific text is stored in the column: '', ' ', 'undefined', 'None', 'n/a', etc. column-level
Boolean values bool Measure the percentage of true and false values in the dataset. column-level
Comparing tables comparisons Compare (reconcile) data across different data sources. table-level
column-level
Custom SQL
and multi column checks
custom_sql Validate data using custom SQL expressions, also comparing multiple columns in the same table. table-level
column-level
Data accuracy accuracy Compare the tested table with another (reference) table by joining tables in the same database. On a column level, compare the sum, min, max, not null count of values. table-level
column-level
Data type conversions conversions Analyze text columns and verify if the values can be parsed and converted to numeric, boolean, float and date types. column-level
Data type detection datatype Analyze text columns that are supposed to store only numeric, boolean, date, timestamp, or text values. Detect if values of a different data type are found instead. column-level
Data volume volume Monitor the volume (row count) in tables and partitions. Get notified when unexpected changes in the volume is detected. table-level
Empty or incomplete columns nulls Detect null values in columns, or detect columns that are empty. Also detect anomalies in the percentage of null values. column-level
Invalid dates datetime Detect common data quality issues in date or datetime columns, such as dates in the future our dates out of a valid range. column-level
Numeric statistics numeric Analyze numeric columns to detect if the values are in the expected ranges. column-level
PII values pii Find PII values (Personally Identifiable Information) in text columns. column-level
Referential integrity integrity Detect data integrity issues, such as missing rows when doing a lookup by a primary key. column-level
Table availability availability Monitor whether the table is accessible and available for use. Detect problems with the data source availability and physical reliability. table-level
Table schema drifts schema Detect table schema changes and drifts. DQOps detects also changes to the list of columns, order of columns, and types of columns. table-level
column-level
Text patterns patterns Validate text columns with built-in patterns such as an email address, or using custom regular expressions. column-level
Text statistics text Analyze text columns to detect values shorter or longer than the maximum accepted length. Also verify if text values can be safely conveerted to numbers, booleans, dates, etc. column-level
Timeliness and freshness timeliness Measure the freshness and staleness of tables. Detect outdated tables with no new records. table-level
Uniqueness and duplicates uniqueness Detect duplicate values in columns. Ensure that values in columns are unique. column-level

Check categories in the user interface

The data quality check editor shows data quality checks categories as collapsible sections.

Table-level check editor

The categories of table-level checks are shown when opening the data quality check editor for a table.

table-level data quality check categories

Column-level check editor

The categories of column-level checks are shown when opening the data quality check editor for a column.

column-level data quality check categories

Referencing check categories in YAML files

If you want to configure data quality checks directly in the DQOps YAML files, the categories are defined as YAML nodes below respective elements for types of data quality checks.

The following example of a .dqotable.yaml shows the category nodes highlighted.

# yaml-language-server: $schema=https://cloud.dqops.com/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  columns:
    cumulative_confirmed:
      type_snapshot:
        column_type: INT64
        nullable: true
      monitoring_checks:
        daily:
          schema:
            daily_column_exists:
              warning:
                expected_value: 1
    date:
      type_snapshot:
        column_type: DATE
        nullable: true
      profiling_checks:
        schema:
         profile_column_exists:
            warning:
              expected_value: 1
      monitoring_checks:
        daily:
          schema:
            daily_column_exists:
              warning:
                expected_value: 1

What's next