HDFS Concurrent Access

ByBipin Patwardhan

May 26, 2018

DZone

Last year, I implemented a data lake. As is standard, we had to ingest data into the data lake, followed by basic processing and advanced processing.

We were using bash scripts for some portions of the data processing pipeline, where we had to copy data from the Linux folders, into HDFS, followed by a few transformations in Hive.

Source: DZone

By Bipin Patwardhan

azure best practices bestpractices big data databricks

Azure Databricks: 14 Best Practices For a Developer

May 29, 2021 Siddharth Mohanty

big data data integration data lake data science data sources data warehouse elt etl reporting staging

What is ETL?

May 29, 2021 Donal Tobin

aws aws glue big data big data tutorial bigdata kinesis tutorial

AWS Serverless Data Lake: Built Real-time Using Apache Hudi, AWS Glue, and Kinesis Stream

May 29, 2021 Gaurav Gupta

Pyntax

HDFS Concurrent Access

ByBipin Patwardhan

By Bipin Patwardhan

Related Post

Azure Databricks: 14 Best Practices For a Developer

What is ETL?

AWS Serverless Data Lake: Built Real-time Using Apache Hudi, AWS Glue, and Kinesis Stream

You missed

I hate installing apps to save money, but this Pixel privacy feature makes it worthwhile

Teslas made in Texas will likely have to leave the state before Texans can buy them

MagSafe used to fish out iPhone 12 Pro dropped in canal

Wacom Cintiq Pro 24 Touch review: Beautiful but needs improvement

Pyntax