DZone
Last year, I implemented a data lake. As is standard, we had to ingest data into the data lake, followed by basic processing and advanced processing.
We were using bash scripts for some portions of the data processing pipeline, where we had to copy data from the Linux folders, into HDFS, followed by a few transformations in Hive.
Source: DZone