Skip to content

Last updated: April 09, 2024

Definition of a data quality rule

Read this guide to understand what is a data quality rule in DQOps, and how it evaluates the data quality measures to detect data quality issues.

What are the rules in DQOps?

In DQOps, the data quality rule and the data quality sensor form the data quality check.

The rule is a set of conditions against which sensor readouts are verified, described by a list of thresholds. A basic rule can simply score the most recent data quality result if the value is above or below a particular value or within the expected range.

Rules evaluate sensors' results and assign them severity levels. There are 3 severity levels in DQOps: warning, error and fatal

Example of rule

A standard data quality check on a table that counts the number of rows uses a simple "min_count" rule. For example when the error severity level is set to 10 and the table has fewer than 10 rows the data quality error will be raised.

Below is an example of a Phyton script that defines classes and methods for min_count threshold rule.

min_count.py
# rule specific parameters object, contains values received from the quality check threshold configuration
class MinCountRuleParametersSpec:
    min_count: int

class HistoricDataPoint:
    timestamp_utc: datetime
    local_datetime: datetime
    back_periods_index: int
    sensor_readout: float

class RuleTimeWindowSettingsSpec:
    prediction_time_window: int
    min_periods_with_readouts: int

# rule execution parameters, contains the sensor value (actual_value) and the rule parameters
class RuleExecutionRunParameters:
    actual_value: float
    parameters: MinCountRuleParametersSpec
    time_period_local: datetime
    previous_readouts: Sequence[HistoricDataPoint]
    time_window: RuleTimeWindowSettingsSpec

# default object that should be returned to the dqo.io engine, specifies if the rule was passed or failed,
# what is the expected value for the rule and what are the upper and lower boundaries of accepted values (optional)
class RuleExecutionResult:
    passed: bool
    expected_value: float
    lower_bound: float
    upper_bound: float

    def __init__(self, passed=None, expected_value=None, lower_bound=None, upper_bound=None):
        self.passed = passed
        self.expected_value = expected_value
        self.lower_bound = lower_bound
        self.upper_bound = upper_bound

# rule evaluation method that should be modified for each type of rule
def evaluate_rule(rule_parameters: RuleExecutionRunParameters) -> RuleExecutionResult:
    if not hasattr(rule_parameters, 'actual_value'):
        return RuleExecutionResult(True, None, None, None)

    expected_value = rule_parameters.parameters.min_count
    lower_bound = rule_parameters.parameters.min_count
    upper_bound = None
    passed = rule_parameters.actual_value >= lower_bound

    return RuleExecutionResult(passed, expected_value, lower_bound, upper_bound)

Rule categories

Rules are divided into the following categories. A full description of each category and subcategory of rules is available at the link.

Configure rules in UI

You can easily configure rules in DQOps using the Configuration section of the user interface.

Rule specification screen

Below is an example of screen with the rule definition for the percent moving average rule.

This screen is responsible for editing the specification files for a custom data quality rule stored in the $DQO_USER_HOME/checks/**/*.dqocheck.yaml files.

Rule definition configuration

Python code editor

The Python source code of the data quality rule is defined on the Python code tab.

Rule Python code configuration

What's next