Skip to content

Distinct count percent

In this example - check the data of bigquery-public-data.covid_italy.national_trends using distinct_count_percent check. The data is updated daily. The goal is to set up a uniqueness check and verify how many percent distinct values are in data.

Adding connection

GCP

Download and install Google Cloud CLI. After installing Google Cloud CLI, log in to your GCP account (you can start one for free), by running:

gcloud auth application-default login

After setting up the GCP account, create a GCP project. That will be the GCP billing project used to run SQL sensors on the public datasets provided by Google.

The examples are using a name of the GCP billing project, received as an environment variable GCP_PROJECT. Set and export this variable before starting DQO shell.

set GCP_PROJECT={here is your GCP billing project}
export GCP_PROJECT={here is your GCP billing project}
export GCP_PROJECT={here is your GCP billing project}

Navigate to the example directory and run the check

cd examples\bigquery-column-distinct-count-percent
..\..\dqo.cmd
cd examples/bigquery-column-distinct-count-percent
../../dqo
cd examples/bigquery-column-distinct-count-percent
../../dqo

After starting the example, run the following commands in the DQO shell:

cloud login
This command will let up login or sign up for the cloud.dqo.ai account.

check run
The data quality checks will be executed.
cloud sync data
The result files will be pushed to cloud.dqo.ai

Now, you can open the browser and navigate to https://cloud.dqo.ai/ and review the sensor results on the dashboards.