Apache Parquet

Apache Parquet

Apache Parquet is is open-source columnar data format by Apache Software Foundation.

Being a columnar data format, it is highly compressible. Having native support for several data types, it is far more robust than text-only formats, like CSV.


Parquet is a good choice as intermediate data lake storage format before bringing the data into Big Data systems. Snappy compression is built into the format, yet one may choose Gzip for better compression instead.

One should always opt to use Parquet instead of CSV for data load intermediate storage. Resulting files are far smaller and BLOB data won't cause havoc like it may with CSV.

© 2021 Spectral Core Limited. All rights reserved.