Local files¶
The file:// source reads local files (CSV, JSONL, Parquet) through the same
readers used by the S3, GCS and SFTP sources. Any file format those sources
support is supported here too, along with globbing, gzip decompression and
#format hints.
omniload supports local files as a data source.
URI Format¶
Everything after file:// is treated as a filesystem path. Relative paths
resolve against the current working directory; an extra leading slash gives an
absolute path.
file://<path>
Form |
Example |
Resolves to |
|---|---|---|
Relative path |
|
|
Absolute path (POSIX) |
|
|
Windows drive |
|
|
Windows UNC |
|
|
Path via |
|
|
Glob |
|
all matching files in |
Format hint |
|
|
The file format is inferred from the extension (.csv, .jsonl, .parquet,
optionally .gz) or from an explicit format hint.
Tip
file:// intentionally treats the first path segment as part of the path, not
as an RFC-8089 host. This is what makes the two-slash form file://data/x.csv
(relative to the working directory) work, matching how csv:// already behaves.
Use the three-slash form file:///abs/x.csv for absolute paths.
Note
Windows paths are supported: file:///C:/data/x.csv (or file://C:/data/x.csv)
reads the drive path C:\data\x.csv, and file:////server/share/x.csv reads the
UNC path \\server\share\x.csv. Backslash input (file://\\server\share\x.csv)
is accepted as well.
Example: Loading a local CSV into DuckDB¶
omniload ingest \
--source-uri 'file://data/users.csv' \
--source-table 'users' \
--dest-uri duckdb:///local.duckdb \
--dest-table 'public.users'
The --source-table value is only used as the path when the URI path is empty
(the split form above); otherwise it is ignored, and the destination table is
controlled by --dest-table.
Supported formats¶
The same set the blob sources support:
#csv- comma-separated values with a header row#csv_headless- CSV without a header row (see below)#jsonl- line-delimited JSON#parquet- Parquet
File glob patterns¶
The path may contain a glob pattern to load multiple files at once. The split
into directory and pattern happens at the first segment containing a glob
character (*, ?, [), so recursive patterns work:
Pattern |
Description |
|---|---|
|
All CSV files at the top level of |
|
All JSONL files under |
|
All gzipped CSV files under |
Compressed files¶
Gzipped files (.gz) are detected and decompressed automatically, so
file://data/events.csv.gz loads without any extra configuration.
File type hinting¶
If a file is correctly encoded but has a non-standard extension, append a
#format fragment to tell omniload how to read it:
omniload ingest \
--source-uri 'file://data/event-data#jsonl' \
--source-table 'events' \
--dest-uri duckdb:///local.duckdb \
--dest-table 'public.events'
A literal # in a path is preserved when the trailing segment is not one of the
known formats, so file://data/vendor#1/report.csv reads the file at
data/vendor#1/report.csv as CSV.
CSV files without headers¶
For CSV files without a header row, use the #csv_headless hint and optionally
supply column names with --columns:
omniload ingest \
--source-uri 'file://data/raw-data.csv#csv_headless' \
--source-table 'raw' \
--columns "id:bigint,name:text,value:double" \
--dest-uri duckdb:///local.duckdb \
--dest-table 'public.raw_data'
Without column names, columns are auto-named unknown_col_0, unknown_col_1,
and so on.
Relationship to csv://¶
The csv:// scheme still exists and is unchanged: it reads a single
local CSV file and also works as a destination. file:// is the broader local
read path, covering JSONL and Parquet as well as CSV, with globbing and format
hints. Use csv:// when you specifically want the standalone CSV reader or a CSV
destination; use file:// for everything else local.