Sensors
Sensor are a predefined and customizable Jinja templates whose goal is to render SQL query in accordance with provider's SQL dialect.
Dialects (Jinja2 macros) are variables obtained from the table metadata and macros performing various functions, written in Jinja2.
User can configure basic statements as where
, group by
, order by
, time series mode,
and different parameters
characteristic for the individual sensors (see the examples).
Tip
When changing configuration in YAML file in your editor, you can use code completion and check possible fields. We highly recommend using Visual Studio Code with installed Better Jinja by Samuel Colvin and YAML by Red Hat plugins.
Sensor types and categories
Sensors are divided into types:
- column,
- table,
and dimensions:
- validity,
- timeliness,
- completeness,
- relevance,
- uniqueness,
- consistency.
Each check adds value to the level indicated by the type. All timeliness checks, for example, are table type since they correspond to the various time aspects of the whole table, whereas column level checks only provide value to the data inside solitary columns.
Each check category relates to a distinct set of business requirements. Validity tests, for example, ensure that the data follows particular guidelines, such as numerical numbers being inside a certain range.
Choosing sensor for a check
Check configuration is very simple, in fact it is almost the same both for column and table type.
To configure quality checks on a table level, you have to add field checks
under table configuration, choose the
category and desired sensor. In case of column level checks, the configuration is the same, only done under chosen
column configuration.
Examples
Configuration of a column level validity check values_in_range_date_percent
with count_equals
rule.
YAML file:
# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
target:
schema_name: dqo_ai_test_data
table_name: continuous_days_one_row_per_day_7839946914895804974
time_series:
mode: current_time
time_gradient: day
columns:
id:
type_snapshot:
column_type: INT64
nullable: true
date:
type_snapshot:
column_type: DATE
nullable: true
checks:
validity:
value_in_range_date_percent:
parameters:
min_value: "2022-01-01"
max_value: "2022-02-13"
include_max_value: false
rules:
count_equals:
low:
expected_value: 90.0
error_margin: 5.0
medium:
expected_value: 80.0
error_margin: 5.0
high:
expected_value: 70.0
error_margin: 5.0
value:
type_snapshot:
column_type: STRING
nullable: true