Date type percent
Description
The date_type_percent
check validates STRING
or VARCHAR
as DATE
.
The check runs three forms of safe casts on the column and calculates the percent of values that can be interpreted as dates.
Safe cast (try cast) works like the usual cast function, when a given value is castable as selected data type. When cast is not possible, it returns NULL
value,
so the risk of getting an error is significantly lower, although it is still possible.
The three mentioned forms of safe casts are:
- built-in safe cast function that casts value as
DATE
, - built-in safe cast function that casts value as
FLOAT
(for UNIX time), - safe parse date for a given date format.
The last one comes in handy when we have to deal with non-standard date formats, e.g. Jan 1, 2022
. For such situations there are built in date formats. In other cases, you can provide your own date format
with custom date format parameter.
Info
custom_date_format
is prioritized, meaning that
if named_date_format
is default or chosen, and custom_date_format
is provided, the query
will run with date format provided by custom_date_format
.
When to use
This sensor comes in handy when we receive a dataset where a column associated with
date is STRING
type, and we would like to validate those records as actual dates.
Warning
Running this check on date
or numeric
column might result with an error.
This is because safe casting date as numeric and vice versa is impossible. That is why we
recommend using this query on string
data types, where such errors do not occur.
Used sensor
Accepted rules
Parameters
This checks has two optional parameters that configure date format to parse:
named_date_format
: str
predefined date format used for parsing string, formats are listed in BuiltInDateFormatscustom_date_format
: str
custom date format used for parsing string
The default format is ISO8601
, which is one of the values of named_date_format
.
In case of custom_date_format
user has to provide a string with desired date format (see the examples).
If custom_date_format
is provided with a non-empty string, named_date_format
is neglected, even if configured.
List of built in date formats
Name | Format |
---|---|
ISO8601 |
'%Y-%m-%d' |
MonthDayYear |
'%m/%d/%Y |
DayMonthYear |
'%d/%m/%Y' |
YearMonthDay |
'%Y/%m/%d' |
MonthNameDayYear |
'%b %d, %Y' |
Name | Format |
---|---|
ISO8601 |
'YYYY-MM-DD' |
MonthDayYear |
'MM/DD/YYYY' |
DayMonthYear |
'DD/MM/YYYY' |
YearMonthDay |
'YYYY/MM/DD' |
MonthNameDayYear |
'MON DD, YYYY' |
How to use
The following examples picture when and how to use the date_type_percent
check. Here we provide a description
of the usage, the whole example is ready to run here.
Let's have a look at the first ten rows from the table used in the example -
bigquery-public-data.labeled_patents.extracted_data
Default configuration
The default configuration of column validity check date_type_percent
with min_count
rule.
If the parameter named_date_format
is not chosen or custom_date_format
is not specified, the
default parsing date format is ISO8601
.
Table configuration in YAML file for column check date_type_percent
on a column dates
in string_dates
table in test_data
dataset in dqo-ai
project on BigQuery and min_count
rule looks like this
For default check configuration the rendered query is
Using named date formats
Table configuration in YAML file for column check date_type_percent
on a column dates
in string_dates
table in
test_data
dataset in dqo-ai
project on BigQuery, min_count
rule and passed parameter named_date_format: 'MonthNameDayYear'
looks like this
Using custom date formats
Table configuration in YAML file for column check date_type_percent
on a column dates
in string_dates
table in
test_data
dataset in dqo-ai
project on BigQuery, count_equals
rule and passed parameter custom_date_format: '%Y, %b, %d'
looks like this
The rendered query is
+----------------------------------------------------------+--------+----------+------------------+---------------------+----------+--------------------+-------------+------------------+----------------------------------+---------------------------+------------------------+--------------------------------------------------------------------------+-----------------+
| gcs_path | issuer | language | publication_date | class_international | class_us | application_number | filing_date | priority_date_eu | representative_line_1_eu | applicant_line_1 | inventor_line_1 | title_line_1 | number |
+----------------------------------------------------------+--------+----------+------------------+---------------------+----------+--------------------+-------------+------------------+----------------------------------+---------------------------+------------------------+--------------------------------------------------------------------------+-----------------+
| gs://gcs-public-data--labeled-patents/espacenet_de2.pdf | EU | DE | 29.08.018 | E04H 6/12 | | 18157874.1 | 21.02.2018 | 22.02.2017 | Liedtke & Partner Patentanwälte | SHB Hebezeugbau GmbH | "VOLGER, Alexander" | STEUERUNGSSYSTEM FÜR AUTOMATISCHE PARKHÄUSER | EP 3 366 869 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de5.pdf | EU | DE | 03.10.2018 | G06F 11/30 | | 18157347.8 | 19.02.2018 | 31.03.2017 | Hoffmann Eitle | FUJITSU LIMITED | "Kukihara, Kensuke" | "METHOD EXECUTED BY A COMPUTER, INFORMATION PROCESSING APPARATUS AND" | EP 3 382 553 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de56.pdf | EU | DE | 03.10.2018 | A01K 31/00 | | 18171005.4 | 05.02.2015 | 05.02.2014 | Stork Bamberger Patentanwälte | Linco Food Systems A/S | "Thrane, Uffe" | MASTHÄHNCHENCONTAINER ALS BESTANDTEIL EINER EINHEIT UND EINER ANORDNUNG | EP 3 381 276 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de70.pdf | EU | DE | 03.10.2018 | H01L 21/20 | | 18166536.5 | 16.02.2016 | | "Scheider, Sascha et al" | EV Group E. Thallner GmbH | "Kurz, Florian" | VORRICHTUNG ZUM BONDEN VON SUBSTRATEN | EP 3 382 744 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de73.pdf | EU | DE | 03.10.2018 | H05B 6/12 | | 18165514.3 | 03.04.2018 | 30.03.2017 | | BSH Hausger√§te GmbH | "Acero Acero, Jesus" | VORRICHTUNG ZUR INDUKTIVEN ENERGIE√úBERTRAGUNG | EP 3 383 141 A2 |
| gs://gcs-public-data--labeled-patents/espacenet_de74.pdf | EU | DE | 03.10.2018 | H05B 6/12 | | 18165513.5 | 03.04.2018 | 30.03.2017 | | BSH Hausger√§te GmbH | "Acero Acero, Jesus" | VORRICHTUNG ZUR INDUKTIVEN ENERGIE√úBERTRAGUNG | EP 3 383 140 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de75.pdf | EU | DE | 03.10.2018 | G06Q 10/06 | | 18165403.9 | 03.04.2018 | 30.03.2017 | "Wallinger, Michael" | AVL List GmbH | "Martin, Piffl" | ROBUSTHEITSANALYSE BEI FAHRZEUGEN | EP 3 382 619 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de76.pdf | EU | DE | 03.10.2018 | A01C 7/00 | | 18164868.4 | 29.03.2018 | 30.03.2017 | Hauck Patentanwaltspartnerschaft | "Schmidt, Robert" | "Schmidt, Robert" | VORRICHTUNG ZUM AUSSÄHEN VON WALDSAMEN MIT EINER BEWEGBAREN | EP 3 381 254 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de77.pdf | EU | DE | 03.10.2018 | H01Q 1/22 | | 18164695.1 | 28.03.2018 | 31.03.2017 | "Kruspig, Volkmar et al" | Antennentechnik ABB Bad | "Harz, Thomas" | ELEKTROTECHNISCHE EINRICHTUNG | EP 3 382 793 A1 |
| gs://gcs-public-data--labeled-patents/espacenet_de78.pdf | EU | DE | 03.10.2018 | A61B 1/00 | | 18164602.7 | 28.03.2018 | 28.03.2017 | "Gramm, Line, & Partner" | "Emmanouilidis, Nikos" | "Emmanouilidis, Nikos" | MEDIZINISCHE VORRICHTUNG | EP 3 381 348 A1 |
+----------------------------------------------------------+--------+----------+------------------+---------------------+----------+--------------------+-------------+------------------+----------------------------------+---------------------------+------------------------+--------------------------------------------------------------------------+-----------------+