COPY FROM
COPY FROM with FORMAT
Overview
This version of a COPY FROM
statement, allows you to specify the imported file format and currently three types are supported.
With each file type there will be differences in performance and behavior
Here’s a list of supported formats:
- CSV (comma-separated values): simple columnar text format
- ORC (optimized row columnar): columnar storage format developed by Apache
- Parquet (Apache Parquet): columnar data storage used in Apache Hadoop ecosystem
Each query without a specified file format assumes to be importing a CSV file (There is no format detection in place)
Syntax
In order to sepcify the file format using the COPY FROM
statement you can use the following syntax:
Format name is case insensitive
Example
When copying from the CSV, ORC or Parquet files as the first step, you need to create a destination table:
Once that is done, you can copy the file content into the table, in the following way:
- ORC
- Parquet
Copying from parquet files is memory consuming. Files bigger than a few gigabytes might result in out of memory error
- CSV
- CSV with Specified Delimiter
Differences in Behavior
-
Ignored Options
HEADER
,DELIMITER
,NULL
options are ignored not affecting the execution of the queries for formats different than CSV.
-
Null Values Handling
- All ORC files have nullable columns. In order to import a nullable column to an Oxla column, which are described as
NOT NULL
, the column in the ORC file cannot contain a null value, otherwise the request will be terminated. - For Parquet files, inserting a nullable column to a non nullable column is allowed as long as there are no null values in the source column.
- All ORC files have nullable columns. In order to import a nullable column to an Oxla column, which are described as
-
Column Matching
- ORC and Parquet files are only being matched based on column index, while CSVs can be matched both with names or indexes. For more information regarding that, please refer to our COPY FROM with HEADER doc