Google BigQuery¶
BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.
omniload supports BigQuery as both a source and destination.
URI format¶
The URI format for BigQuery is as follows:
bigquery://<project-name>?credentials_path=/path/to/service/account.json&location=<location>
URI parameters:
project-name: the name of the project in which the dataset residescredentials_path: optional, the path to the service account JSON file. If not provided, omniload will use Application Default Credentialscredentials_base64: optional, base64-encoded service account JSON credentialslocation: optional, the location of the dataset
Authentication¶
omniload supports multiple authentication methods for BigQuery:
Explicit credentials (via
credentials_pathorcredentials_base64in URI):bigquery://my-project?credentials_path=/path/to/service-account.json
Application Default Credentials (recommended for local development and GCP environments):
bigquery://my-project
When no credentials are provided in the URI, omniload will use the Google authentication library which automatically discovers credentials from:
The
GOOGLE_APPLICATION_CREDENTIALSenvironment variableUser credentials set via
gcloud auth application-default loginService account credentials when running on Google Cloud (Compute Engine, App Engine, Cloud Run, etc.)
The same URI structure can be used both for sources and destinations. You can read more about SQLAlchemy’s BigQuery dialect here.
Using GCS as a staging area¶
omniload can use GCS as a staging area for BigQuery. To do this, you need to set the --staging-bucket flag when you are running the command.
omniload ingest
--source-uri $SOURCE_URI
--dest-uri $BIGQUERY_URI
--source-table raw.input
--dest-table raw.output
--staging-bucket "gs://your-bucket-name" # [!code focus]