# Local files The `file://` source reads local files (CSV, JSONL, Parquet) through the same readers used by the S3, GCS and SFTP sources. Any file format those sources support is supported here too, along with globbing, gzip decompression and `#format` hints. `omniload` supports local files as a data source. ## URI Format Everything after `file://` is treated as a filesystem path. Relative paths resolve against the current working directory; an extra leading slash gives an absolute path. ```text file:// ``` | Form | Example | Resolves to | | :--- | :--- | :--- | | Relative path | `file://data/users.csv` | `/data/users.csv` | | Absolute path (POSIX) | `file:///srv/data/users.jsonl` | `/srv/data/users.jsonl` | | Windows drive | `file:///C:/data/users.csv` (or `file://C:/data/users.csv`) | `C:\data\users.csv` | | Windows UNC | `file:////server/share/users.csv` | `\\server\share\users.csv` | | Path via `--source-table` | `--source-uri file:// --source-table data/users.parquet` | `/data/users.parquet` | | Glob | `file://data/*.csv` | all matching files in `/data` | | Format hint | `file://feed.dat#csv` | `feed.dat` read as CSV | The file format is inferred from the extension (`.csv`, `.jsonl`, `.parquet`, optionally `.gz`) or from an explicit [format hint](#file-type-hinting). :::{tip} `file://` intentionally treats the first path segment as part of the path, not as an RFC-8089 host. This is what makes the two-slash form `file://data/x.csv` (relative to the working directory) work, matching how `csv://` already behaves. Use the three-slash form `file:///abs/x.csv` for absolute paths. ::: :::{note} Windows paths are supported: `file:///C:/data/x.csv` (or `file://C:/data/x.csv`) reads the drive path `C:\data\x.csv`, and `file:////server/share/x.csv` reads the UNC path `\\server\share\x.csv`. Backslash input (`file://\\server\share\x.csv`) is accepted as well. ::: ## Example: Loading a local CSV into DuckDB ```sh omniload ingest \ --source-uri 'file://data/users.csv' \ --source-table 'users' \ --dest-uri duckdb:///local.duckdb \ --dest-table 'public.users' ``` The `--source-table` value is only used as the path when the URI path is empty (the split form above); otherwise it is ignored, and the destination table is controlled by `--dest-table`. ## Supported formats The same set the blob sources support: - `#csv` - comma-separated values with a header row - `#csv_headless` - CSV without a header row (see below) - `#jsonl` - line-delimited JSON - `#parquet` - Parquet ## File glob patterns The path may contain a glob pattern to load multiple files at once. The split into directory and pattern happens at the first segment containing a glob character (`*`, `?`, `[`), so recursive patterns work: | Pattern | Description | | :--- | :--- | | `file://data/*.csv` | All CSV files at the top level of `/data`. | | `file://data/**/*.jsonl` | All JSONL files under `/data`, recursively. | | `file:///srv/logs/**/*.csv.gz` | All gzipped CSV files under `/srv/logs`, recursively. | ## Compressed files Gzipped files (`.gz`) are detected and decompressed automatically, so `file://data/events.csv.gz` loads without any extra configuration. ## File type hinting If a file is correctly encoded but has a non-standard extension, append a `#format` fragment to tell `omniload` how to read it: ```sh omniload ingest \ --source-uri 'file://data/event-data#jsonl' \ --source-table 'events' \ --dest-uri duckdb:///local.duckdb \ --dest-table 'public.events' ``` A literal `#` in a path is preserved when the trailing segment is not one of the known formats, so `file://data/vendor#1/report.csv` reads the file at `data/vendor#1/report.csv` as CSV. ### CSV files without headers For CSV files without a header row, use the `#csv_headless` hint and optionally supply column names with `--columns`: ```sh omniload ingest \ --source-uri 'file://data/raw-data.csv#csv_headless' \ --source-table 'raw' \ --columns "id:bigint,name:text,value:double" \ --dest-uri duckdb:///local.duckdb \ --dest-table 'public.raw_data' ``` Without column names, columns are auto-named `unknown_col_0`, `unknown_col_1`, and so on. ## Relationship to `csv://` The [`csv://`](csv.md) scheme still exists and is unchanged: it reads a single local CSV file and also works as a destination. `file://` is the broader local read path, covering JSONL and Parquet as well as CSV, with globbing and format hints. Use `csv://` when you specifically want the standalone CSV reader or a CSV destination; use `file://` for everything else local.