# HTTP

omniload supports reading CSV, JSON, and Parquet files from public HTTP/HTTPS URLs. This allows you to ingest data from publicly accessible file URLs directly into your databases.

## URI format

The URI format for HTTP sources is as follows:

```text
http://example.com/path/to/file.csv
https://example.com/path/to/file.json
https://example.com/path/to/file.parquet
```

## Supported file formats

The HTTP source supports the following file formats:

- **CSV** (`.csv`) - Comma-separated values files with headers
- **CSV Headless** - CSV files without headers (use `#csv_headless` suffix)
- **JSON** (`.json`, `.jsonl`) - JSON objects and JSON Lines format
- **Parquet** (`.parquet`) - Apache Parquet columnar format

The file format is automatically inferred from the URL extension. You can also explicitly specify the format using the `#format` suffix in the `--source-table` parameter.

## Usage

### Basic example

```bash
omniload ingest \
    --source-uri "https://example.com/data.csv" \
    --source-table "data" \
    --dest-uri "duckdb:///local.duckdb" \
    --dest-table "my_table"
```

### Specifying file format explicitly

If the URL doesn't have a recognizable extension (e.g., an API endpoint), you can specify the format using the `#format` suffix in `--source-table`:

```bash
omniload ingest \
    --source-uri "https://example.com/api/export" \
    --source-table "data#csv" \
    --dest-uri "duckdb:///local.duckdb" \
    --dest-table "my_table"
```

### CSV without headers

For CSV files that don't have a header row, use `#csv_headless`. You can optionally provide column names using the `--columns` flag:

```bash
# With custom column names
omniload ingest \
    --source-uri "https://example.com/data.csv" \
    --source-table "data#csv_headless" \
    --columns "id:bigint,name:text,value:double" \
    --dest-uri "duckdb:///local.duckdb" \
    --dest-table "my_table"
```

If no column names are provided, columns will be automatically named `unknown_col_0`, `unknown_col_1`, etc.:

```bash
# Without column names (auto-generated)
omniload ingest \
    --source-uri "https://example.com/data.csv" \
    --source-table "data#csv_headless" \
    --dest-uri "duckdb:///local.duckdb" \
    --dest-table "my_table"
```

### Example with JSON file

```bash
omniload ingest \
    --source-uri "https://api.example.com/export/data.json" \
    --source-table "data" \
    --dest-uri "snowflake://user:pass@account/database/schema" \
    --dest-table "json_data"
```

### Example with Parquet file

```bash
omniload ingest \
    --source-uri "https://storage.example.com/data.parquet" \
    --source-table "data" \
    --dest-uri "bigquery://project/dataset" \
    --dest-table "parquet_data"
```

## Supported format suffixes

You can use these suffixes with `--source-table` to explicitly specify the file format:

| Suffix | Format |
|--------|--------|
| `#csv` | CSV with headers |
| `#csv_headless` | CSV without headers |
| `#json` | JSON |
| `#jsonl` | JSON Lines |
| `#parquet` | Parquet |

## Notes

- The `--source-table` parameter is required; use it to specify the format suffix if needed (e.g., `data#csv_headless`)
- The HTTP source downloads the entire file before processing, so ensure you have sufficient memory for large files
- Authentication is not currently supported; only publicly accessible URLs can be used
- The file must be accessible without requiring cookies, headers, or other authentication mechanisms
- For very large files, consider using a dedicated file storage source (e.g., S3, GCS)
- Data is processed in chunks internally:
  - CSV: 10,000 rows per chunk
  - JSON: 1,000 objects per chunk
  - Parquet: 10,000 rows per chunk

## Limitations

- Only supports public URLs (no authentication)
- The entire file is downloaded into memory before processing
- No support for incremental loading