Data Processing Using Functions in Prosto: An Alternative to Map-Reduce and SQL

DZone

Why Prosto? Having Only Set Operations Is Not Enough

Typical data processing tasks have to access and analyze data stored in multiple tables. They could be called relations, collections, or sets in different systems but we will refer to them as tables for simplicity. The general task of data processing is to derive new data from these multiple tables and each solid data (processing) model must answer the following three important questions: how to compute new columns within one table, how to link tables and how to aggregate data. Below we shortly describe how these tasks are solved in a traditional set-oriented model and where these solutions have significant flaws.

Calculation. Given a table, we frequently need to add a new column with values computed from other columns in this same table. Conceptually, the task is similar to defining a cell in a spreadsheet, for example, C1=A1+B1. Easy and natural? Yes. However, it is not so easy in traditional data processing frameworks. The main problem is that we need to define a new table because adding a column to an existing table is not possible. Prosto toolkit is intended to fix this flaw by providing a dedicated operation where a new column can be added as in this example: ColumnC=ColumnA+ColumnB.

Source: DZone

Pyntax

Data Processing Using Functions in Prosto: An Alternative to Map-Reduce and SQL

ByAlexandr Savinov

Why Prosto? Having Only Set Operations Is Not Enough

By Alexandr Savinov

Related Post

Standalone Runtime Server in Mule 4

DataWeave Practice: Prime Number Code

3 Common Encryption Mistakes That Are Easy to Avoid

You missed

Teslas made in Texas will likely have to leave the state before Texans can buy them

MagSafe used to fish out iPhone 12 Pro dropped in canal

Wacom Cintiq Pro 24 Touch review: Beautiful but needs improvement

Google made it hard for users to keep location data private

Pyntax