Skip to content

Value in range date percent

Description

The valuein_range_date_percent calculates the percentage of records that are within a certain date range. Date range is configurable by the user by defining min_value and max_value check parameters. It is also possible to decide whether to include upper and lower bounds of the range, they are included by default.

Warning

There are four acceptable data types: DATE, DATETIME, TIMESTAMP, STRING. This check is set to validate date values, so in the other three cases CAST() is performed, so keep in mind that casting is not always possible. The safest option is to provide the data in ISO 8601 format, in other cases formatting might be necessary.

When to use

This sensor is used when you need to confirm that the event took place in a certain time period.

Used sensor

Value in range date percent


Accepted rules

Min count

Count equals


Parameters

  • min_value: str
    minimal value range variable, should be provided in ISO 8601 format
  • max_value: str
    maximal value range variable, should be provided in ISO 8601 format
  • include_min_value: bool (Optional)
    a variable deciding whether to include the lower limit of the range (default: true)
  • include_min_value: bool (Optional)
    a variable deciding whether to include the upper limit of the range (default: true)

Warning

The default date format is ISO 8601, if your data set contains date values it should be formatted correctly, otherwise you might not get the results.

How to use

Default configuration

Table configuration in YAML file for column check value_in_range_date_percent on a column real_datetime in string_datestable in test_data dataset in dqo-ai project on BigQuery, min_count rule and passed parameters min_value = "2022-04-01", max_value = "2022-04-10"

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: test_data_timeliness_sensors_1287831065858823464
  time_series:
    mode: timestamp_column
    time_gradient: day
    timestamp_column: real_datetime
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    reference_datetime:
      type_snapshot:
        column_type: DATETIME
        nullable: true
      checks:
        validity:
          value_in_range_date_percent:
            parameters:
              min_value:  "2022-04-01"
              max_value:  "2022-04-10"
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 90.0
                high:
                  min_value: 90.0
    real_datetime:
      type_snapshot:
        column_type: DATETIME
        nullable: true
SELECT
    100.0 * SUM(
        CASE
            WHEN SAFE_CAST(analyzed_table.`reference_datetime` AS DATE) >= '2022-04-01' AND SAFE_CAST(analyzed_table.`reference_datetime` AS DATE) <= '2022-04-10' THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`test_data_timeliness_sensors_1287831065858823464` AS analyzed_table

Exclude min and max values

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: test_data_time_series_1901211650245440619
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date_type:
      type_snapshot:
        column_type: DATE
        nullable: true
      checks:
        validity:
          value_in_range_date_percent:
            parameters:
              min_value:  "2019-12-23"
              max_value:  "2022-01-15"
              include_min_value: false
              include_max_value: false
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 90.0
                high:
                  min_value: 90.0
    mixed_mail:
      type_snapshot:
        column_type: STRING
        nullable: true
SELECT
    100.0 * SUM(
        CASE
            WHEN analyzed_table.`date_type` > '2019-12-23' AND analyzed_table.`date_type` < '2022-01-15' THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`test_data_time_series_1901211650245440619` AS analyzed_table