Google BigQuery

BigQuery is a fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.

omniload supports BigQuery as both a source and destination.

URI format

The URI format for BigQuery is as follows:

bigquery://<project-name>?credentials_path=/path/to/service/account.json&location=<location>

URI parameters:

  • project-name: the name of the project in which the dataset resides

  • credentials_path: optional, the path to the service account JSON file. If not provided, omniload will use Application Default Credentials

  • credentials_base64: optional, base64-encoded service account JSON credentials

  • location: optional, the location of the dataset

Authentication

omniload supports multiple authentication methods for BigQuery:

  1. Explicit credentials (via credentials_path or credentials_base64 in URI):

    bigquery://my-project?credentials_path=/path/to/service-account.json
    
  2. Application Default Credentials (recommended for local development and GCP environments):

    bigquery://my-project
    

    When no credentials are provided in the URI, omniload will use the Google authentication library which automatically discovers credentials from:

    • The GOOGLE_APPLICATION_CREDENTIALS environment variable

    • User credentials set via gcloud auth application-default login

    • Service account credentials when running on Google Cloud (Compute Engine, App Engine, Cloud Run, etc.)

The same URI structure can be used both for sources and destinations. You can read more about SQLAlchemy’s BigQuery dialect here.

Using GCS as a staging area

omniload can use GCS as a staging area for BigQuery. To do this, you need to set the --staging-bucket flag when you are running the command.

omniload ingest 
    --source-uri $SOURCE_URI
    --dest-uri $BIGQUERY_URI
    --source-table raw.input 
    --dest-table raw.output
    --staging-bucket "gs://your-bucket-name" # [!code focus]