Skip to content

Date type percent

Description

The date_type_percent check validates STRING or VARCHAR as DATE. The check runs three forms of safe casts on the column and calculates the percent of values that can be interpreted as dates.

Safe cast (try cast) works like the usual cast function, when a given value is castable as selected data type. When cast is not possible, it returns NULL value, so the risk of getting an error is significantly lower, although it is still possible.

The three mentioned forms of safe casts are:

  • built-in safe cast function that casts value as DATE,
  • built-in safe cast function that casts value as FLOAT (for UNIX time),
  • safe parse date for a given date format.

The last one comes in handy when we have to deal with non-standard date formats, e.g. Jan 1, 2022. For such situations there are built in date formats. In other cases, you can provide your own date format with custom date format parameter.

Info

custom_date_format is prioritized, meaning that if named_date_format is default or chosen, and custom_date_format is provided, the query will run with date format provided by custom_date_format.

When to use

This sensor comes in handy when we receive a dataset where a column associated with date is STRING type, and we would like to validate those records as actual dates.

Warning

Running this check on date or numeric column might result with an error. This is because safe casting date as numeric and vice versa is impossible. That is why we recommend using this query on string data types, where such errors do not occur.


Used sensor

Date type percent


Accepted rules

Min count

Count equals


Parameters

This checks has two optional parameters that configure date format to parse:

  • named_date_format: str
    predefined date format used for parsing string, formats are listed in BuiltInDateFormats
  • custom_date_format: str
    custom date format used for parsing string

The default format is ISO8601, which is one of the values of named_date_format. In case of custom_date_format user has to provide a string with desired date format (see the examples).

If custom_date_format is provided with a non-empty string, named_date_format is neglected, even if configured.

List of built in date formats

Name Format
ISO8601 '%Y-%m-%d'
MonthDayYear '%m/%d/%Y
DayMonthYear '%d/%m/%Y'
YearMonthDay '%Y/%m/%d'
MonthNameDayYear '%b %d, %Y'
Name Format
ISO8601 'YYYY-MM-DD'
MonthDayYear 'MM/DD/YYYY'
DayMonthYear 'DD/MM/YYYY'
YearMonthDay 'YYYY/MM/DD'
MonthNameDayYear 'MON DD, YYYY'

How to use

The following examples picture when and how to use the date_type_percent check. Here we provide a description of the usage, the whole example is ready to run here.

Let's have a look at the first ten rows from the table used in the example - bigquery-public-data.labeled_patents.extracted_data

Default configuration

The default configuration of column validity check date_type_percent with min_count rule. If the parameter named_date_format is not chosen or custom_date_format is not specified, the default parsing date format is ISO8601.

Table configuration in YAML file for column check date_type_percent on a column dates in string_dates table in test_data dataset in dqo-ai project on BigQuery and min_count rule looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: continuous_days_different_time_data_types_5060904924218909170
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date_type:
      type_snapshot:
        column_type: STRING
        nullable: true
    date_type_Mon_d_Y:
      type_snapshot:
        column_type: STRING
        nullable: true
      checks:
        validity:
          date_type_percent:
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0

For default check configuration the rendered query is

SELECT
    100.0 * SUM(
        CASE
            WHEN SAFE_CAST(analyzed_table.`valid_string_100` AS FLOAT64) IS NOT NULL
            OR SAFE_CAST(analyzed_table.`valid_string_100` AS DATE) IS NOT NULL
            OR SAFE.PARSE_DATE('%Y-%m-%d', analyzed_table.`valid_string_100`) IS NOT NULL THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`continuous_days_date_and_string_formats_7998702180845642887` AS analyzed_table

Using named date formats

Table configuration in YAML file for column check date_type_percent on a column dates in string_dates table in test_data dataset in dqo-ai project on BigQuery, min_count rule and passed parameter named_date_format: 'MonthNameDayYear' looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: continuous_days_different_time_data_types_5060904924218909170
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date_type:
      type_snapshot:
        column_type: STRING
        nullable: true
    date_type_Mon_d_Y:
      type_snapshot:
        column_type: STRING
        nullable: true
      checks:
        validity:
          date_type_percent:
            parameters:
              named_date_format: MonthNameDayYear
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0
The rendered query is

SELECT
    100.0 * SUM(
        CASE
            WHEN SAFE_CAST(analyzed_table.`date_type_Mon_d_Y` AS FLOAT64) IS NOT NULL
            OR SAFE_CAST(analyzed_table.`date_type_Mon_d_Y` AS DATE) IS NOT NULL
            OR SAFE.PARSE_DATE('%b %d, %Y', analyzed_table.`date_type_Mon_d_Y`) IS NOT NULL THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`continuous_days_different_time_data_types_5060904924218909170` AS analyzed_table

Using custom date formats

Table configuration in YAML file for column check date_type_percent on a column dates in string_dates table in test_data dataset in dqo-ai project on BigQuery, count_equals rule and passed parameter custom_date_format: '%Y, %b, %d' looks like this

# yaml-language-server: $schema=https://cloud.dqo.ai/dqo-yaml-schema/TableYaml-schema.json
apiVersion: dqo/v1
kind: table
spec:
  target:
    schema_name: dqo_ai_test_data
    table_name: continuous_days_different_time_data_types_5060904924218909170
  time_series:
    mode: current_time
    time_gradient: day
  columns:
    id:
      type_snapshot:
        column_type: INT64
        nullable: true
    date_type:
      type_snapshot:
        column_type: STRING
        nullable: true
    date_type_Mon_d_Y:
      type_snapshot:
        column_type: STRING
        nullable: true
      checks:
        validity:
          date_type_percent:
            parameters:
              custom_date_format: "%Y, %b, %d"
            rules:
              min_count:
                low:
                  min_value: 90.0
                medium:
                  min_value: 80.0
                high:
                  min_value: 70.0

The rendered query is

SELECT
    100.0 * SUM(
        CASE
            WHEN SAFE_CAST(analyzed_table.`date_type_Mon_d_Y` AS FLOAT64) IS NOT NULL
            OR SAFE_CAST(analyzed_table.`date_type_Mon_d_Y` AS DATE) IS NOT NULL
            OR SAFE.PARSE_DATE('%Y, %b, %d', analyzed_table.`date_type_Mon_d_Y`) IS NOT NULL THEN 1
            ELSE 0
        END
    ) / COUNT(*) AS actual_value
FROM `dqo-ai-testing`.`dqo_ai_test_data`.`continuous_days_different_time_data_types_5060904924218909170` AS analyzed_table
+----------------------------------------------------------+--------+----------+------------------+---------------------+----------+--------------------+-------------+------------------+----------------------------------+---------------------------+------------------------+--------------------------------------------------------------------------+-----------------+
|                         gcs_path                         | issuer | language | publication_date | class_international | class_us | application_number | filing_date | priority_date_eu |     representative_line_1_eu     |     applicant_line_1      |    inventor_line_1     |                               title_line_1                               |     number      |
+----------------------------------------------------------+--------+----------+------------------+---------------------+----------+--------------------+-------------+------------------+----------------------------------+---------------------------+------------------------+--------------------------------------------------------------------------+-----------------+
| gs://gcs-public-data--labeled-patents/espacenet_de2.pdf  | EU     | DE       |        29.08.018 | E04H 6/12           |          |         18157874.1 |  21.02.2018 |       22.02.2017 | Liedtke & Partner Patentanwälte | SHB Hebezeugbau GmbH      | "VOLGER, Alexander"    | STEUERUNGSSYSTEM FÜR AUTOMATISCHE PARKHÄUSER                           | EP 3 366 869 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de5.pdf  | EU     | DE       |       03.10.2018 | G06F 11/30          |          |         18157347.8 |  19.02.2018 |       31.03.2017 | Hoffmann Eitle                   | FUJITSU LIMITED           | "Kukihara, Kensuke"    | "METHOD EXECUTED BY A COMPUTER, INFORMATION PROCESSING APPARATUS AND"    | EP 3 382 553 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de56.pdf | EU     | DE       |       03.10.2018 | A01K 31/00          |          |         18171005.4 |  05.02.2015 |       05.02.2014 | Stork Bamberger Patentanwälte   | Linco Food Systems A/S    | "Thrane, Uffe"         | MASTHÄHNCHENCONTAINER ALS BESTANDTEIL EINER EINHEIT UND EINER ANORDNUNG | EP 3 381 276 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de70.pdf | EU     | DE       |       03.10.2018 | H01L 21/20          |          |         18166536.5 |  16.02.2016 |                  | "Scheider, Sascha et al"         | EV Group E. Thallner GmbH | "Kurz, Florian"        | VORRICHTUNG ZUM BONDEN VON SUBSTRATEN                                    | EP 3 382 744 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de73.pdf | EU     | DE       |       03.10.2018 | H05B 6/12           |          |         18165514.3 |  03.04.2018 |       30.03.2017 |                                  | BSH Hausger√§te GmbH      | "Acero Acero, Jesus"   | VORRICHTUNG ZUR INDUKTIVEN ENERGIE√úBERTRAGUNG                           | EP 3 383 141 A2 |
| gs://gcs-public-data--labeled-patents/espacenet_de74.pdf | EU     | DE       |       03.10.2018 | H05B 6/12           |          |         18165513.5 |  03.04.2018 |       30.03.2017 |                                  | BSH Hausger√§te GmbH      | "Acero Acero, Jesus"   | VORRICHTUNG ZUR INDUKTIVEN ENERGIE√úBERTRAGUNG                           | EP 3 383 140 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de75.pdf | EU     | DE       |       03.10.2018 | G06Q 10/06          |          |         18165403.9 |  03.04.2018 |       30.03.2017 | "Wallinger, Michael"             | AVL List GmbH             | "Martin, Piffl"        | ROBUSTHEITSANALYSE BEI FAHRZEUGEN                                        | EP 3 382 619 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de76.pdf | EU     | DE       |       03.10.2018 | A01C 7/00           |          |         18164868.4 |  29.03.2018 |       30.03.2017 | Hauck Patentanwaltspartnerschaft | "Schmidt, Robert"         | "Schmidt, Robert"      | VORRICHTUNG ZUM AUSSÄHEN VON WALDSAMEN MIT EINER BEWEGBAREN             | EP 3 381 254 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de77.pdf | EU     | DE       |       03.10.2018 | H01Q 1/22           |          |         18164695.1 |  28.03.2018 |       31.03.2017 | "Kruspig, Volkmar et al"         | Antennentechnik ABB Bad   | "Harz, Thomas"         | ELEKTROTECHNISCHE EINRICHTUNG                                            | EP 3 382 793 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de78.pdf | EU     | DE       |       03.10.2018 | A61B 1/00           |          |         18164602.7 |  28.03.2018 |       28.03.2017 | "Gramm, Line, & Partner"         | "Emmanouilidis, Nikos"    | "Emmanouilidis, Nikos" | MEDIZINISCHE VORRICHTUNG                                                 | EP 3 381 348 A1 |
+----------------------------------------------------------+--------+----------+------------------+---------------------+----------+--------------------+-------------+------------------+----------------------------------+---------------------------+------------------------+--------------------------------------------------------------------------+-----------------+