Local files

The file:// source reads local files (CSV, JSONL, Parquet) through the same readers used by the S3, GCS and SFTP sources. Any file format those sources support is supported here too, along with globbing, gzip decompression and #format hints.

omniload supports local files as a data source.

URI Format

Everything after file:// is treated as a filesystem path. Relative paths resolve against the current working directory; an extra leading slash gives an absolute path.

file://<path>

Form

Example

Resolves to

Relative path

file://data/users.csv

<cwd>/data/users.csv

Absolute path (POSIX)

file:///srv/data/users.jsonl

/srv/data/users.jsonl

Windows drive

file:///C:/data/users.csv (or file://C:/data/users.csv)

C:\data\users.csv

Windows UNC

file:////server/share/users.csv

\\server\share\users.csv

Path via --source-table

--source-uri file:// --source-table data/users.parquet

<cwd>/data/users.parquet

Glob

file://data/*.csv

all matching files in <cwd>/data

Format hint

file://feed.dat#csv

feed.dat read as CSV

The file format is inferred from the extension (.csv, .jsonl, .parquet, optionally .gz) or from an explicit format hint.

Tip

file:// intentionally treats the first path segment as part of the path, not as an RFC-8089 host. This is what makes the two-slash form file://data/x.csv (relative to the working directory) work, matching how csv:// already behaves. Use the three-slash form file:///abs/x.csv for absolute paths.

Note

Windows paths are supported: file:///C:/data/x.csv (or file://C:/data/x.csv) reads the drive path C:\data\x.csv, and file:////server/share/x.csv reads the UNC path \\server\share\x.csv. Backslash input (file://\\server\share\x.csv) is accepted as well.

Example: Loading a local CSV into DuckDB

omniload ingest \
    --source-uri 'file://data/users.csv' \
    --source-table 'users' \
    --dest-uri duckdb:///local.duckdb \
    --dest-table 'public.users'

The --source-table value is only used as the path when the URI path is empty (the split form above); otherwise it is ignored, and the destination table is controlled by --dest-table.

Supported formats

The same set the blob sources support:

  • #csv - comma-separated values with a header row

  • #csv_headless - CSV without a header row (see below)

  • #jsonl - line-delimited JSON

  • #parquet - Parquet

File glob patterns

The path may contain a glob pattern to load multiple files at once. The split into directory and pattern happens at the first segment containing a glob character (*, ?, [), so recursive patterns work:

Pattern

Description

file://data/*.csv

All CSV files at the top level of <cwd>/data.

file://data/**/*.jsonl

All JSONL files under <cwd>/data, recursively.

file:///srv/logs/**/*.csv.gz

All gzipped CSV files under /srv/logs, recursively.

Compressed files

Gzipped files (.gz) are detected and decompressed automatically, so file://data/events.csv.gz loads without any extra configuration.

File type hinting

If a file is correctly encoded but has a non-standard extension, append a #format fragment to tell omniload how to read it:

omniload ingest \
    --source-uri 'file://data/event-data#jsonl' \
    --source-table 'events' \
    --dest-uri duckdb:///local.duckdb \
    --dest-table 'public.events'

A literal # in a path is preserved when the trailing segment is not one of the known formats, so file://data/vendor#1/report.csv reads the file at data/vendor#1/report.csv as CSV.

CSV files without headers

For CSV files without a header row, use the #csv_headless hint and optionally supply column names with --columns:

omniload ingest \
    --source-uri 'file://data/raw-data.csv#csv_headless' \
    --source-table 'raw' \
    --columns "id:bigint,name:text,value:double" \
    --dest-uri duckdb:///local.duckdb \
    --dest-table 'public.raw_data'

Without column names, columns are auto-named unknown_col_0, unknown_col_1, and so on.

Relationship to csv://

The csv:// scheme still exists and is unchanged: it reads a single local CSV file and also works as a destination. file:// is the broader local read path, covering JSONL and Parquet as well as CSV, with globbing and format hints. Use csv:// when you specifically want the standalone CSV reader or a CSV destination; use file:// for everything else local.