Value in range numerical percent
Description
The value_in_range_numerical_percent
check calculates the percent of numerical values that are within a range provided
by the user (with min_value
and max_value
).
Range bounds are required parameters. There are two more parameters: include_min_value
and include_max_value
, that
you can use to (as the name suggests) include min and\or max value. By default, range bounds are included.
When to use
This can be used on numerical columns, where you want to validate values to be in a certain range. For example a customer's age should be greater or equal to 18, and lesser than e.g. 100 (see the example).
Warning
Running this check on a non-numeric column may result with an error.
Used sensor
Value in range numerical percent
Accepted rules
Parameters
This checks has four parameters, two of which are optional:
min_value
: float
minimal value of a rangemax_value
: float
maximal value of a rangeinclude_min_value
: bool (Optional)
a variable deciding whether to include the minimal value of the range (default: True)include_max_value
: bool (Optional)
a variable deciding whether to include the maximal value of the range (default: True)
How to use
Default configuration
The default configuration of column validity check value_in_range_numerical_percent
with min_count
rule.
If the parameters include_min_value
and include_max_value
are not chosen, the
default value for these parameters is true
.
Table configuration in YAML file for column check value_in_range_numerical_percent
on a column id
in table
table in test_data
dataset in dqo-ai
project on BigQuery and min_count
rule, looks like this
SELECT
100.0 * SUM(
CASE
WHEN analyzed_table.`id` >= 1.0 AND analyzed_table.`id` <= 2.0 THEN 1
ELSE 0
END
) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`table` AS analyzed_table
Using all optional parameters
Table configuration in YAML file for column check value_in_range_numerical_percent
on a column dates
in string_dates
table in test_data
dataset in dqo-ai
project on BigQuery and min_count
rule, looks like this
include_min_value: false
and include_max_value: false
.
The rendered query is
SELECT
100.0 * SUM(
CASE
WHEN analyzed_table.`id` > 0.0 AND analyzed_table.`id` < 25.0 THEN 1
ELSE 0
END
) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`table` AS analyzed_table
Using only one optional parameter
Table configuration in YAML file for column check value_in_range_numerical_percent
on a column customer_age
in
customers
table in test_data
dataset in dqo-ai
project on BigQuery and min_count
rule, looks like this
include_max_value: false
.
The rendered query is
The same is for the upper bound.