Skip to content

Value in range numerical percent

Description

The value_in_range_numerical_percent check calculates the percent of numerical values that are within a range provided by the user (with min_value and max_value).

Range bounds are required parameters. There are two more parameters: include_min_value and include_max_value, that you can use to (as the name suggests) include min and\or max value. By default, range bounds are included.

When to use

This can be used on numerical columns, where you want to validate values to be in a certain range. For example a customer's age should be greater or equal to 18, and lesser than e.g. 100 (see the example).

Warning

Running this check on a non-numeric column may result with an error.


Used sensor

Value in range numerical percent


Accepted rules

Min count

Count equals


Parameters

This checks has four parameters, two of which are optional:

  • min_value: float
    minimal value of a range
  • max_value: float
    maximal value of a range
  • include_min_value: bool (Optional)
    a variable deciding whether to include the minimal value of the range (default: True)
  • include_max_value: bool (Optional)
    a variable deciding whether to include the maximal value of the range (default: True)

How to use

Default configuration

The default configuration of column validity check value_in_range_numerical_percent with min_count rule. If the parameters include_min_value and include_max_value are not chosen, the default value for these parameters is true.

Table configuration in YAML file for column check value_in_range_numerical_percent on a column id in table table in test_data dataset in dqo-ai project on BigQuery and min_count rule, looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: table
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
      checks:
        validity:
          value_in_range_numerical_percent:
            parameters:
              min_value: 1.0
              max_value: 2.0
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0
For default check configuration the rendered query is
SELECT
    100.0 * SUM(
        CASE
            WHEN analyzed_table.`id` >= 1.0 AND analyzed_table.`id` <= 2.0 THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`table` AS analyzed_table

Using all optional parameters

Table configuration in YAML file for column check value_in_range_numerical_percent on a column dates in string_dates table in test_data dataset in dqo-ai project on BigQuery and min_count rule, looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: table
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
      checks:
        validity:
          value_in_range_numerical_percent:
            parameters:
              min_value: 0.0
              max_value: 25.0
              include_min_value: false
              include_max_value: false
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0
In this case, we set two optional parameters include_min_value: false and include_max_value: false. The rendered query is
SELECT
    100.0 * SUM(
        CASE
            WHEN analyzed_table.`id` > 0.0 AND analyzed_table.`id` < 25.0 THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`table` AS analyzed_table

Using only one optional parameter

Table configuration in YAML file for column check value_in_range_numerical_percent on a column customer_age in customers table in test_data dataset in dqo-ai project on BigQuery and min_count rule, looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: customers
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    customer_age:
      type_snapshot:
        column_type: INT64
        nullable: true
      checks:
        validity:
          value_in_range_numerical_percent:
            parameters:
              min_value: 18.0
              max_value: 100.0
              include_max_value: false
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0
In this case, we set only one optional parameter include_max_value: false. The rendered query is
SELECT
    100.0 * SUM(
        CASE
            WHEN analyzed_table.`customer_age` >= 18.0 AND analyzed_table.`customer_age` < 100.0 THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`customers` AS analyzed_table
The same is for the upper bound.