DZone
Overview of Problem
Tabular data is quite common in various business and engineering problems. Machine learning can be used to predict particular columns of the table we are interested in, using other columns as input features. We will take an example of using historical house sales data to predict sales prices for houses that come on the market in the future. The house prices dataset from Kaggle contains such data for Ames, Iowa. It contains predictive columns like house area, neighborhood area name, type of building, house style, condition, year last sold, etc., among a total of 79 such predictive features. Some of these features are categorical while others are numerical and our goal is to predict the Sale Price (a numeric column) of houses using these features.
Id
|
LotArea
|
Neighborhood
|
BldgType
|
Style
|
Cond
|
YrBuilt
|
1stFlrSF
|
2ndFlrSF
|
Fireplaces
|
YrSold
|
SalePrice
|
1
|
8450
|
CollgCr
|
1Fam
|
2Story
|
5
|
2003
|
856
|
854
|
0
|
2008
|
208500
|
2
|
9600
|
Veenker
|
1Fam
|
1Story
|
8
|
1976
|
1262
|
0
|
1
|
2007
|
181500
|
3
|
11250
|
CollgCr
|
1Fam
|
2Story
|
5
|
2001
|
920
|
866
|
1
|
2008
|
223500
|
4
|
9550
|
Crawfor
|
1Fam
|
2Story
|
5
|
1915
|
961
|
756
|
1
|
2006
|
140000
|
5
|
14260
|
NoRidge
|
1Fam
|
2Story
|
5
|
2000
|
1145
|
1053
|
1
|
2008
|
250000
|
6
|
14115
|
Mitchel
|
1Fam
|
1.5Fin
|
5
|
1993
|
796
|
566
|
0
|
2009
|
143000
|
7
|
10084
|
Somerst
|
1Fam
|
1Story
|
5
|
2004
|
1694
|
0
|
1
|
2007
|
307000
|
8
|
10382
|
NWAmes
|
1Fam
|
2Story
|
6
|
1973
|
1107
|
983
|
2
|
2009
|
200000
|
9
|
6120
|
OldTown
|
1Fam
|
1.5Fin
|
5
|
1931
|
1022
|
752
|
2
|
2008
|
129900
|
10
|
7420
|
BrkSide
|
2fmCon
|
1.5Unf
|
6
|
1939
|
1077
|
0
|
2
|
2008
|
118000
|
11
|
11200
|
Sawyer
|
1Fam
|
1Story
|
5
|
1965
|
1040
|
0
|
0
|
2008
|
129500
|
12
|
11924
|
NridgHt
|
1Fam
|
2Story
|
5
|
2005
|
1182
|
1142
|
2
|
2006
|
345000
|
13
|
12968
|
Sawyer
|
1Fam
|
1Story
|
6
|
1962
|
912
|
0
|
0
|
2008
|
144000
|
Overview of Google AutoML Tables
Google AutoML Tables enables quick and high accuracy training and subsequent hosting of ML models for such a problem. Users can import and visualize the data, train a model, evaluate it on a test set, iterate on improving model accuracy and then host the best model for online/offline predictions. All of the above functionality is available as a service without any ML expertise or hardware or software installation required from users.
AutoML table can train both regression and classification models depending on the type of column we are trying to predict.
Source: DZone