Skip to content

Sensors

Sensor are a predefined and customizable Jinja templates whose goal is to render SQL query in accordance with provider's SQL dialect.

Dialects (Jinja2 macros) are variables obtained from the table metadata and macros performing various functions, written in Jinja2.

User can configure basic statements as where, group by, order by, time series mode, and different parameters characteristic for the individual sensors (see the examples).

Tip

When changing configuration in YAML file in your editor, you can use code completion and check possible fields. We highly recommend using Visual Studio Code with installed Better Jinja by Samuel Colvin and YAML by Red Hat plugins.

Sensor types and categories

Sensors are divided into types:

  • column,
  • table,

and dimensions:

  • validity,
  • timeliness,
  • completeness,
  • relevance,
  • uniqueness,
  • consistency.

Each check adds value to the level indicated by the type. All timeliness checks, for example, are table type since they correspond to the various time aspects of the whole table, whereas column level checks only provide value to the data inside solitary columns.

Each check category relates to a distinct set of business requirements. Validity tests, for example, ensure that the data follows particular guidelines, such as numerical numbers being inside a certain range.

Choosing sensor for a check

Check configuration is very simple, in fact it is almost the same both for column and table type. To configure quality checks on a table level, you have to add field checks under table configuration, choose the category and desired sensor. In case of column level checks, the configuration is the same, only done under chosen column configuration.

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: continuous_days_one_row_per_day_7839946914895804974
  time_series:
    mode: current_time
    time_gradient: day
  checks:
    consistency:
      row_count:
        rules:
          min_count:
            low:
              min_value: 90.0
            medium:
              min_value: 80.0
            high:
              min_value: 70.0
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date:
      type_snapshot:
        column_type: DATE
        nullable: true
    value:
      type_snapshot:
        column_type: STRING
        nullable: true
# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: continuous_days_one_row_per_day_7839946914895804974
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date:
      type_snapshot:
        column_type: DATE
        nullable: true
      checks:
        validity:
          value_in_range_date_percent:
            parameters:
              min_value: "2022-01-01"
              max_value: "2022-02-13"
              include_max_value: false
            rules:
              count_equals:
                low:
                  expected_value: 90.0
                  error_margin: 5.0
                medium:
                  expected_value: 80.0
                  error_margin: 5.0
                high:
                  expected_value: 70.0
                  error_margin: 5.0
    value:
      type_snapshot:
        column_type: STRING
        nullable: true

Examples

Configuration of a column level validity check values_in_range_date_percent with count_equals rule.

YAML file:

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: continuous_days_one_row_per_day_7839946914895804974
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date:
      type_snapshot:
        column_type: DATE
        nullable: true
      checks:
        validity:
          value_in_range_date_percent:
            parameters:
              min_value: "2022-01-01"
              max_value: "2022-02-13"
              include_max_value: false
            rules:
              count_equals:
                low:
                  expected_value: 90.0
                  error_margin: 5.0
                medium:
                  expected_value: 80.0
                  error_margin: 5.0
                high:
                  expected_value: 70.0
                  error_margin: 5.0
    value:
      type_snapshot:
        column_type: STRING
        nullable: true
SELECT
    100.0 * SUM(
        CASE
            WHEN analyzed_table.`date` >= '2022-01-01' AND analyzed_table.`date` < '2022-02-13' THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`continuous_days_one_row_per_day_7839946914895804974` AS analyzed_table