Skip to content

Column datetime difference percent

Description

The column_datetime_differnece_percent check calculates a time difference between records in corresponding columns, compares those values to a threshold, and shows the percentage of rows where time difference is greater than the threshold.

User has to configure three parameters: column1, column2 check calculates datetime differences between records in these columns , max_difference which is a threshold value. The time_scale corresponds to the time scale of the datetime difference. This parameter is optional, and the default time scale is a DAY.

Info

Although this sensor is designed to work on DATETIME columns, it is possible to provide DATE, TIMESTSMP or STRING columns, only they will be cast as DATETIME.

When to use

This check aims at informing the user about the amount of data that is considered timely.

Used template

Column datetime difference percent

Accepted rules

Min count

Parameters

  • column1: str
    the first column to calculate the time difference
  • column2: str
    the second column to calculate the time difference
  • time_scale: str (optional)
    a datetime difference timescale, accepted values: DAY,HOUR, MINUTE, SECOND (default: DAY)
  • max_diffference: int
    threshold to compare time differences, anything above this threshold is considered as delayed

How to use

The default configuration of column validity check column_datetime_differnece_percent on columns reference_datetime and real_datetime with min_count rule and default time scale.

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: test_data_timeliness_sensors_1287831065858823464
  time_series:
    mode: current_time
    time_gradient: day
  checks:
    timeliness:
      column_datetime_difference_percent:
        parameters:
          column1: "reference_datetime"
          column2: "real_datetime"
          max_difference: 10
        rules:
          min_count:
            low:
              min_value: 90.0
            medium:
              min_value: 80.0
            high:
              min_value: 70.0
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    reference_datetime:
      type_snapshot:
        column_type: DATETIME
        nullable: true
    real_datetime:
      type_snapshot:
        column_type: DATETIME
        nullable: true
The query is rendered with default time scale: DAY.
SELECT
    100.0*SUM(
        CASE
            WHEN
                ABS(DATETIME_DIFF(analyzed_table.reference_datetime, analyzed_table.real_datetime, DAY)) < 10 THEN 1
            ELSE 0 END
            )/COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`test_data_timeliness_sensors_1287831065858823464` AS analyzed_table

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: test_data_timeliness_sensors_1287831065858823464
  time_series:
    mode: current_time
    time_gradient: day
  checks:
    timeliness:
      column_datetime_difference_percent:
        parameters:
          column1: "reference_datetime"
          column2: "real_datetime"
          max_difference: 12
          time_scale: "HOUR"
        rules:
          min_count:
            low:
              min_value: 90.0
            medium:
              min_value: 80.0
            high:
              min_value: 70.0
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    reference_datetime:
      type_snapshot:
        column_type: DATETIME
        nullable: true
    real_datetime:
      type_snapshot:
        column_type: DATETIME
        nullable: true
SELECT
    100.0*SUM(
        CASE
            WHEN
                ABS(DATETIME_DIFF(analyzed_table.reference_datetime, analyzed_table.real_datetime, HOUR)) < 12 THEN 1
            ELSE 0 END
            )/COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`test_data_timeliness_sensors_1287831065858823464` AS analyzed_table