Skip to content

String length in range percent

Description

The string_length_in_range_percent check calculates the percent of string values whose length is within a certain range provided by the user. If a provided data type is not STRING, the column is cast as one.

When to use

This check is useful when there is a need to validate value's length, e.g. social security number, postal code.

Info

To find value's length we use built in STRING function LENGTH. Keep in mind that white spaces are included into the count.


Used sensor

String length in range percent


Accepted rules

Min count

Count equals


Parameters

This checks has two optional parameters that configure date format to parse:

  • min_length: Integer
    minimal string's length
  • max_length: Integer
    maximal string's length

How to use

On the string column

Table configuration in YAML file for column check string_length_in_range_percent on a column string_length in table table in test_data dataset in dqo-ai project on BigQuery and min_count rule ,looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: table
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    string_length:
      type_snapshot:
        column_type: STRING
        nullable: true
      checks:
        validity:
          string_length_in_range_percent:
            parameters:
              min_length: 6
              max_length: 9
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0
For default check configuration and passed min_length: 6, max_length: 9. The rendered query is
SELECT
    CASE
        WHEN COUNT(analyzed_table.`strings_length`) = 0 THEN NULL
        ELSE
            100.0 * SUM(
                        CASE WHEN LENGTH( analyzed_table.`strings_length` ) BETWEEN 6 AND 9 THEN 1
                                ELSE 0
                            END
    ) / COUNT(analyzed_table.`strings_length`) END AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`table` AS analyzed_table

On the numeric column

Table configuration in YAML file for column check string_length_in_range_percent on a column int_length in table table in test_data dataset in dqo-ai project on BigQuery and min_count rule ,looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: table
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    int_length:
      type_snapshot:
        column_type: STRING
        nullable: true
      checks:
        validity:
          string_length_in_range_percent:
            parameters:
              min_length: 6
              max_length: 9
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0
In this case the check working on numeric column. The rendered query is
SELECT
    CASE
        WHEN COUNT(analyzed_table.`int_length`) = 0 THEN NULL
        ELSE
            100.0 * SUM(
                        CASE WHEN LENGTH( SAFE_CAST(analyzed_table.`int_length` AS STRING) ) BETWEEN 6 AND 9 THEN 1
                                ELSE 0
                            END
    ) / COUNT(analyzed_table.`int_length`) END AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`table` AS analyzed_table