DZone

Why Prosto? Having Only Set Operations Is Not Enough

Typical data processing tasks have to access and analyze data stored in multiple tables. They could be called relations, collections, or sets in different systems but we will refer to them as tables for simplicity. The general task of data processing is to derive new data from these multiple tables and each solid data (processing) model must answer the following three important questions: how to compute new columns within one table, how to link tables and how to aggregate data. Below we shortly describe how these tasks are solved in a traditional set-oriented model and where these solutions have significant flaws.

Calculation. Given a table, we frequently need to add a new column with values computed from other columns in this same table. Conceptually, the task is similar to defining a cell in a spreadsheet, for example, C1=A1+B1. Easy and natural? Yes. However, it is not so easy in traditional data processing frameworks. The main problem is that we need to define a new table because adding a column to an existing table is not possible. Prosto toolkit is intended to fix this flaw by providing a dedicated operation where a new column can be added as in this example: ColumnC=ColumnA+ColumnB

Source: DZone