Overview

Importing data from a Parquet file into Oxla can be accomplished using various commands and tools. This guide explains how to copy data from a Parquet file by accessing cloud storage to copy tables, allowing you to migrate data from remote sources.

Syntax

The syntax for this function is as follows:

COPY table_name FROM ‘cloud_storage_file_path’ WITH (option);

Parameters

  • table_name: existing table where the data will be imported
  • cloud_storage_file_path: complete path to the parquet file stored in cloud storage, used for importing data
  • option: to be specified:
    • Endpoint: provide object-based storage credentials
    • FORMAT: format name (e.g. parquet)

Examples

Importing Data from Cloud Storage

To import data from an object storage into a table in Oxla, you can use the COPY FROM command with object storage credentials. This command allows you to transfer data from cloud storage services like AWS S3, Google Cloud Storage or Azure Blob Storage directly into your Oxla instance.

COPY table_name FROM 'cloud_storage_file_path' (object_storage(object_storage_credentials));
  • object storage: AWS_CRED,AZURE_CRED or GCS_CRED (depending on your provider)
  • object_storage_credentials: for accessing your cloud storage

You need to provide Provider-Specific credentials to authenticate access to your files. Use the following authentication parameters to access your cloud storage:

AWS S3 Bucket

  • aws_region: AWS region associated with the storage service
  • key_id: key identifier for authentication
  • access_key: access key for authentication
  • endpoint_url: URL endpoint for the storage service
COPY table_name FROM 's3://your-bucket/file_name' WITH (AWS_CRED(AWS_REGION 'us-west-1', AWS_KEY_ID 'key_id', AWS_PRIVATE_KEY 'access_key', ENDPOINT 's3.us-west-1.amazonaws.com'), FORMAT parquet);

Google Cloud Storage

  • <path_to_credentials>: path to JSON credentials file
  • <json_credentials_string>: contents of the GCS’s credentials file
COPY table_name FROM 'gs://your-bucket/file_name' WITH (GCS_CRED('/path/to/credentials.json'), FORMAT parquet);
For Google Cloud Storage, it’s recommended to use HMAC keys for authentication. You can find more details about that on the HMAC keys - Cloud Storage page.

Azure Blob Storage

  • tenant_id: tenant identifier representing your organization’s identity in Azure
  • client_id: client identifier used for authentication
  • client_secret: secret identifier acting as a password for authentication.
COPY table_name FROM 'wasbs://container-name/your_blob' WITH (AZURE_CRED(TENANT_ID 'your_tenant_id' CLIENT_ID 'your_client_id', CLIENT_SECRET 'your_client_secret'), FORMAT parquet);