Writing Parquet Format Data to Regular Files (i.e., Not Hadoop HDFS)

ByRoger Voss

May 22, 2018

DZone

The Apache Parquet format is a compressed, efficient columnar data representation. The existing Parquet Java libraries available were developed for and within the Hadoop ecosystem. Hence there tends to a be near automatic assumption that one is working with the Hadoop distributed filesystem, HDFS.

There are situations that one might want to create Parquet-formatted data to a regular file system file – particularly if not working in a context that assumes Hadoop and HDFS are present. Some big data tools and runtime stacks, which do not assume Hadoop, can work directly with Parquet files.

Source: DZone

By Roger Voss

azure best practices bestpractices big data databricks

Pyntax

Writing Parquet Format Data to Regular Files (i.e., Not Hadoop HDFS)

ByRoger Voss

By Roger Voss

Related Post

Azure Databricks: 14 Best Practices For a Developer

What is ETL?

AWS Serverless Data Lake: Built Real-time Using Apache Hudi, AWS Glue, and Kinesis Stream

You missed

I hate installing apps to save money, but this Pixel privacy feature makes it worthwhile

Teslas made in Texas will likely have to leave the state before Texans can buy them

MagSafe used to fish out iPhone 12 Pro dropped in canal

Wacom Cintiq Pro 24 Touch review: Beautiful but needs improvement

Pyntax