Starting From Hadoop

With the rise of big data in recent years, distributed computing engines are emerging one after another. Apache Hadoop is a collection of open-source software utilities that is widely applied on many large websites. The core design of Hadoop comes from the Google MapReduce paper. It is inspired by the map and reduce functions commonly used in functional programming. Map is a higher-order function that applies a given function to each element of a functor, e.g. a list, returning a list of results in the same order. It is often called apply-to-all when considered in functional form. A reduce method performs a summary operation. The MapReduce algorithm is used to classify and process data.

A Little About Apache Spark

Apache Spark is a lightning-fast parallel general-purpose cluster-computing framework. Originally developed at the AMPLab of UC Berkeley in 2009, the Spark codebase was donated to the Apache Software Foundation in 2010. Inspired by Hadoop, Spark inherits the advantages of distributed parallel computing and provides a rich set of operators.

Source: DZone