Apache Parquet

Apache Parquet

Apache Parquet is is open-source columnar data format by Apache Software Foundation.

Being a columnar data format, it is highly compressible. Having native support for several data types, it is far more robust than text-only formats, like CSV.


Parquet is a good choice as intermediate data lake storage format before bringing the data into Big Data systems. Snappy compression is built into the format, yet one may choose Gzip for better compression instead.

One should always opt to use Parquet instead of CSV for data load intermediate storage. Resulting files are far smaller and BLOB data won't cause havoc like it may with CSV.

Omni Loader

We support Apache Parquet directly. No additional drivers are required.

Apache Parquet data types we support

Integral

bigint, int

Decimal

decimal, double, float

Text

ntext

Date/Time

timestamp

Large objects

byte array, ntext

Other

boolean, timespan