Data Processing¶
This tutorial explains how tabular data can be handled and transformed with the Table
class.
Note
All operations on a Table
return a new Table
. The original Table
will not be changed.
Create & Load data¶
- Load your data into a
Table
:
In [1]:
Copied!
from safeds.data.tabular.containers import Table
titanic = Table.from_csv_file("data/titanic.csv")
from safeds.data.tabular.containers import Table
titanic = Table.from_csv_file("data/titanic.csv")
- Create a
Table
containing only the first 10 rows:
In [2]:
Copied!
titanic_slice = titanic.slice_rows(end=10)
titanic_slice # just to show the output
titanic_slice = titanic.slice_rows(end=10)
titanic_slice # just to show the output
Out[2]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
- Extract a
Row
from yourTable
:
In [3]:
Copied!
titanic_slice.get_row(0)
titanic_slice.get_row(0)
Out[3]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.55 | NaN | Southampton | 0 |
- Extract a
Column
from yourTable
:
In [4]:
Copied!
titanic_slice.get_column("name")
titanic_slice.get_column("name")
Out[4]:
name | |
---|---|
0 | Abbing, Mr. Anthony |
1 | Abbott, Master. Eugene Joseph |
2 | Abbott, Mr. Rossmore Edward |
3 | Abbott, Mrs. Stanton (Rosa Hunt) |
4 | Abelseth, Miss. Karen Marie |
5 | Abelseth, Mr. Olaus Jorgensen |
6 | Abelson, Mr. Samuel |
7 | Abelson, Mrs. Samuel (Hannah Wizosky) |
8 | Abrahamsson, Mr. Abraham August Johannes |
9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) |
- Combine a list of
Row
s to aTable
(make sure theRow
s have the same columns):
In [5]:
Copied!
Table.from_rows([
titanic_slice.get_row(0),
titanic_slice.get_row(1)
])
Table.from_rows([
titanic_slice.get_row(0),
titanic_slice.get_row(1)
])
Out[5]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.55 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.25 | NaN | Southampton | 0 |
- Combine a list of
Column
s to aTable
(make sure theColumn
s have the same amount of rows):
In [6]:
Copied!
Table.from_columns([
titanic_slice.get_column("name"),
titanic_slice.get_column("age")
])
Table.from_columns([
titanic_slice.get_column("name"),
titanic_slice.get_column("age")
])
Out[6]:
name | age | |
---|---|---|
0 | Abbing, Mr. Anthony | 42.0 |
1 | Abbott, Master. Eugene Joseph | 13.0 |
2 | Abbott, Mr. Rossmore Edward | 16.0 |
3 | Abbott, Mrs. Stanton (Rosa Hunt) | 35.0 |
4 | Abelseth, Miss. Karen Marie | 16.0 |
5 | Abelseth, Mr. Olaus Jorgensen | 25.0 |
6 | Abelson, Mr. Samuel | 30.0 |
7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 28.0 |
8 | Abrahamsson, Mr. Abraham August Johannes | 20.0 |
9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 18.0 |
- Drop columns from a
Table
:
In [7]:
Copied!
titanic_slice.remove_columns([
"id",
"name",
"ticket",
"cabin",
"port_embarked",
"survived"
])
titanic_slice.remove_columns([
"id",
"name",
"ticket",
"cabin",
"port_embarked",
"survived"
])
Out[7]:
sex | age | siblings_spouses | parents_children | travel_class | fare | |
---|---|---|---|---|---|---|
0 | male | 42.0 | 0 | 0 | 3 | 7.5500 |
1 | male | 13.0 | 0 | 2 | 3 | 20.2500 |
2 | male | 16.0 | 1 | 1 | 3 | 20.2500 |
3 | female | 35.0 | 1 | 1 | 3 | 20.2500 |
4 | female | 16.0 | 0 | 0 | 3 | 7.6500 |
5 | male | 25.0 | 0 | 0 | 3 | 7.6500 |
6 | male | 30.0 | 1 | 0 | 2 | 24.0000 |
7 | female | 28.0 | 1 | 0 | 2 | 24.0000 |
8 | male | 20.0 | 0 | 0 | 3 | 7.9250 |
9 | female | 18.0 | 0 | 0 | 3 | 7.2292 |
- Keep only specified columns of a
Table
:
In [8]:
Copied!
titanic_slice.keep_only_columns(["name", "survived"])
titanic_slice.keep_only_columns(["name", "survived"])
Out[8]:
name | survived | |
---|---|---|
0 | Abbing, Mr. Anthony | 0 |
1 | Abbott, Master. Eugene Joseph | 0 |
2 | Abbott, Mr. Rossmore Edward | 0 |
3 | Abbott, Mrs. Stanton (Rosa Hunt) | 1 |
4 | Abelseth, Miss. Karen Marie | 1 |
5 | Abelseth, Mr. Olaus Jorgensen | 1 |
6 | Abelson, Mr. Samuel | 0 |
7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 1 |
8 | Abrahamsson, Mr. Abraham August Johannes | 1 |
9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 1 |
Process data¶
- Sort columns by their name:
In [9]:
Copied!
titanic_slice.sort_columns()
titanic_slice.sort_columns()
Out[9]:
age | cabin | fare | id | name | parents_children | port_embarked | sex | siblings_spouses | survived | ticket | travel_class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42.0 | NaN | 7.5500 | 0 | Abbing, Mr. Anthony | 0 | Southampton | male | 0 | 0 | C.A. 5547 | 3 |
1 | 13.0 | NaN | 20.2500 | 1 | Abbott, Master. Eugene Joseph | 2 | Southampton | male | 0 | 0 | C.A. 2673 | 3 |
2 | 16.0 | NaN | 20.2500 | 2 | Abbott, Mr. Rossmore Edward | 1 | Southampton | male | 1 | 0 | C.A. 2673 | 3 |
3 | 35.0 | NaN | 20.2500 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 1 | Southampton | female | 1 | 1 | C.A. 2673 | 3 |
4 | 16.0 | NaN | 7.6500 | 4 | Abelseth, Miss. Karen Marie | 0 | Southampton | female | 0 | 1 | 348125 | 3 |
5 | 25.0 | F G63 | 7.6500 | 5 | Abelseth, Mr. Olaus Jorgensen | 0 | Southampton | male | 0 | 1 | 348122 | 3 |
6 | 30.0 | NaN | 24.0000 | 6 | Abelson, Mr. Samuel | 0 | Cherbourg | male | 1 | 0 | P/PP 3381 | 2 |
7 | 28.0 | NaN | 24.0000 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 0 | Cherbourg | female | 1 | 1 | P/PP 3381 | 2 |
8 | 20.0 | NaN | 7.9250 | 8 | Abrahamsson, Mr. Abraham August Johannes | 0 | Southampton | male | 0 | 1 | SOTON/O2 3101284 | 3 |
9 | 18.0 | NaN | 7.2292 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 0 | Cherbourg | female | 0 | 1 | 2657 | 3 |
- Sort columns with a custom comparator:
In [10]:
Copied!
titanic_slice.sort_columns(
lambda column1, column2:
(column1.name < column2.name) - (column1.name > column2.name)
)
titanic_slice.sort_columns(
lambda column1, column2:
(column1.name < column2.name) - (column1.name > column2.name)
)
Out[10]:
travel_class | ticket | survived | siblings_spouses | sex | port_embarked | parents_children | name | id | fare | cabin | age | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | C.A. 5547 | 0 | 0 | male | Southampton | 0 | Abbing, Mr. Anthony | 0 | 7.5500 | NaN | 42.0 |
1 | 3 | C.A. 2673 | 0 | 0 | male | Southampton | 2 | Abbott, Master. Eugene Joseph | 1 | 20.2500 | NaN | 13.0 |
2 | 3 | C.A. 2673 | 0 | 1 | male | Southampton | 1 | Abbott, Mr. Rossmore Edward | 2 | 20.2500 | NaN | 16.0 |
3 | 3 | C.A. 2673 | 1 | 1 | female | Southampton | 1 | Abbott, Mrs. Stanton (Rosa Hunt) | 3 | 20.2500 | NaN | 35.0 |
4 | 3 | 348125 | 1 | 0 | female | Southampton | 0 | Abelseth, Miss. Karen Marie | 4 | 7.6500 | NaN | 16.0 |
5 | 3 | 348122 | 1 | 0 | male | Southampton | 0 | Abelseth, Mr. Olaus Jorgensen | 5 | 7.6500 | F G63 | 25.0 |
6 | 2 | P/PP 3381 | 0 | 1 | male | Cherbourg | 0 | Abelson, Mr. Samuel | 6 | 24.0000 | NaN | 30.0 |
7 | 2 | P/PP 3381 | 1 | 1 | female | Cherbourg | 0 | Abelson, Mrs. Samuel (Hannah Wizosky) | 7 | 24.0000 | NaN | 28.0 |
8 | 3 | SOTON/O2 3101284 | 1 | 0 | male | Southampton | 0 | Abrahamsson, Mr. Abraham August Johannes | 8 | 7.9250 | NaN | 20.0 |
9 | 3 | 2657 | 1 | 0 | female | Cherbourg | 0 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 9 | 7.2292 | NaN | 18.0 |
- Filter rows with a given query:
In [11]:
Copied!
titanic.filter_rows(
lambda row:
"van" in row.get_value("name")
)
titanic.filter_rows(
lambda row:
"van" in row.get_value("name")
)
Out[11]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 105 | Beavan, Mr. William Thomas | male | 19.0 | 0 | 0 | 323951 | 3 | 8.0500 | NaN | Southampton | 0 |
1 | 191 | Canavan, Miss. Mary | female | 21.0 | 0 | 0 | 364846 | 3 | 7.7500 | NaN | Queenstown | 0 |
2 | 192 | Canavan, Mr. Patrick | male | 21.0 | 0 | 0 | 364858 | 3 | 7.7500 | NaN | Queenstown | 0 |
3 | 267 | Cor, Mr. Ivan | male | 27.0 | 0 | 0 | 349229 | 3 | 7.8958 | NaN | Southampton | 0 |
4 | 308 | Davies, Mr. Evan | male | 22.0 | 0 | 0 | SC/A4 23568 | 3 | 8.0500 | NaN | Southampton | 0 |
5 | 333 | Devaney, Miss. Margaret Delia | female | 19.0 | 0 | 0 | 330958 | 3 | 7.8792 | NaN | Queenstown | 1 |
6 | 338 | Dimic, Mr. Jovan | male | 42.0 | 0 | 0 | 315088 | 3 | 8.6625 | NaN | Southampton | 0 |
7 | 383 | Evans, Miss. Edith Corse | female | 36.0 | 0 | 0 | PC 17531 | 1 | 31.6792 | A29 | Cherbourg | 0 |
8 | 586 | Ivanoff, Mr. Kanio | male | NaN | 0 | 0 | 349201 | 3 | 7.8958 | NaN | Southampton | 0 |
9 | 589 | Jalsevac, Mr. Ivan | male | 29.0 | 0 | 0 | 349240 | 3 | 7.8958 | NaN | Cherbourg | 1 |
10 | 811 | Mineff, Mr. Ivan | male | 24.0 | 0 | 0 | 349233 | 3 | 7.8958 | NaN | Southampton | 0 |
11 | 916 | O'Sullivan, Miss. Bridget Mary | female | NaN | 0 | 0 | 330909 | 3 | 7.6292 | NaN | Queenstown | 0 |
12 | 918 | Ovies y Rodriguez, Mr. Servando | male | 28.5 | 0 | 0 | PC 17562 | 1 | 27.7208 | D43 | Cherbourg | 0 |
13 | 990 | Rasmussen, Mrs. (Lena Jacobsen Solvang) | female | NaN | 0 | 0 | 65305 | 3 | 8.1125 | NaN | Southampton | 0 |
14 | 1143 | Staneff, Mr. Ivan | male | NaN | 0 | 0 | 349208 | 3 | 7.8958 | NaN | Southampton | 0 |
15 | 1144 | Stankovic, Mr. Ivan | male | 33.0 | 0 | 0 | 349239 | 3 | 8.6625 | NaN | Cherbourg | 0 |
16 | 1161 | Strilic, Mr. Ivan | male | 27.0 | 0 | 0 | 315083 | 3 | 8.6625 | NaN | Southampton | 0 |
17 | 1215 | van Billiard, Master. James William | male | NaN | 1 | 1 | A/5. 851 | 3 | 14.5000 | NaN | Southampton | 0 |
18 | 1216 | van Billiard, Master. Walter John | male | 11.5 | 1 | 1 | A/5. 851 | 3 | 14.5000 | NaN | Southampton | 0 |
19 | 1217 | van Billiard, Mr. Austin Blyler | male | 40.5 | 0 | 2 | A/5. 851 | 3 | 14.5000 | NaN | Southampton | 0 |
20 | 1222 | van Melkebeke, Mr. Philemon | male | NaN | 0 | 0 | 345777 | 3 | 9.5000 | NaN | Southampton | 0 |
Transform table¶
- Transform table using
Imputer
.Imputer
s replace missing values with other values (e.g. a constant, the mean or the median of the column etc.) depending on the chosen startegy, for example, the followingImputer
will replace missing values in the given columns of the table with the constant 0:
In [12]:
Copied!
from safeds.data.tabular.transformation import Imputer
imputer = Imputer(Imputer.Strategy.Constant(0)).fit(titanic, ["age", "fare", "cabin", "port_embarked"])
imputer.transform(titanic_slice)
from safeds.data.tabular.transformation import Imputer
imputer = Imputer(Imputer.Strategy.Constant(0)).fit(titanic, ["age", "fare", "cabin", "port_embarked"])
imputer.transform(titanic_slice)
Out[12]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.55 | 0 | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.25 | 0 | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.25 | 0 | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.25 | 0 | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | 0 | 348125 | 3 | 7.65 | 0 | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | 0 | 348122 | 3 | 7.65 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0 | 0 | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0 | 0 | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.925 | 0 | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | 0 | Cherbourg | 1 |
- Transform table using
LabelEncoder
, this will encode categorical features in the chosenColumn
s as integers:
In [13]:
Copied!
from safeds.data.tabular.transformation import LabelEncoder
encoder = LabelEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
from safeds.data.tabular.transformation import LabelEncoder
encoder = LabelEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
Out[13]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | 1.0 | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | 2.0 | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | 1.0 | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | 2.0 | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | 1.0 | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 2.0 | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 0.0 | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 2.0 | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | 0.0 | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | 2.0 | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | 1.0 | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | 2.0 | 1 |
6 | 6 | Abelson, Mr. Samuel | 1.0 | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 0.0 | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | 1.0 | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | 2.0 | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 0.0 | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | 0.0 | 1 |
- Transform table using
OneHotEncoder
, this will create newColumn
s based on unique values in each chosenColumn
:
In [14]:
Copied!
from safeds.data.tabular.transformation import OneHotEncoder
encoder = OneHotEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
from safeds.data.tabular.transformation import OneHotEncoder
encoder = OneHotEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
Out[14]:
id | name | sex__male | sex__female | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked__Southampton | port_embarked__Cherbourg | port_embarked__Queenstown | port_embarked__nan | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | 1.0 | 0.0 | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | 1.0 | 0.0 | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | 1.0 | 0.0 | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 0.0 | 1.0 | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | 0.0 | 1.0 | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | 1.0 | 0.0 | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
6 | 6 | Abelson, Mr. Samuel | 1.0 | 0.0 | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 1.0 | 0.0 | 0.0 | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 0.0 | 1.0 | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 1.0 | 0.0 | 0.0 | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | 1.0 | 0.0 | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 0.0 | 1.0 | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | 0.0 | 1.0 | 0.0 | 0.0 | 1 |
- Transform table using
RangeScaler
, this will scale the values in the chosenColumn
s to a given range:
In [15]:
Copied!
from safeds.data.tabular.transformation import RangeScaler
scaler = RangeScaler(0.0, 1.0).fit(titanic, ["age"])
scaler.transform(titanic_slice)
from safeds.data.tabular.transformation import RangeScaler
scaler = RangeScaler(0.0, 1.0).fit(titanic, ["age"])
scaler.transform(titanic_slice)
Out[15]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 0.524008 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | 0.160751 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | male | 0.198330 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 0.436325 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | female | 0.198330 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 0.311064 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | male | 0.373695 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 0.348643 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 0.248434 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 0.223382 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
- Transform table using
StandardScaler
, this will standardize values of chosenColumn
s:
In [16]:
Copied!
from safeds.data.tabular.transformation import StandardScaler
scaler = StandardScaler().fit(titanic, ["age", "travel_class"])
scaler.transform(titanic_slice)
from safeds.data.tabular.transformation import StandardScaler
scaler = StandardScaler().fit(titanic, ["age", "travel_class"])
scaler.transform(titanic_slice)
Out[16]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 0.841202 | 0 | 0 | C.A. 5547 | 0.841916 | 7.5500 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | -1.171763 | 0 | 2 | C.A. 2673 | 0.841916 | 20.2500 | NaN | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | male | -0.963526 | 1 | 1 | C.A. 2673 | 0.841916 | 20.2500 | NaN | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 0.355314 | 1 | 1 | C.A. 2673 | 0.841916 | 20.2500 | NaN | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | female | -0.963526 | 0 | 0 | 348125 | 0.841916 | 7.6500 | NaN | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | -0.338812 | 0 | 0 | 348122 | 0.841916 | 7.6500 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | male | 0.008251 | 1 | 0 | P/PP 3381 | -0.352091 | 24.0000 | NaN | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | -0.130574 | 1 | 0 | P/PP 3381 | -0.352091 | 24.0000 | NaN | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | -0.685875 | 0 | 0 | SOTON/O2 3101284 | 0.841916 | 7.9250 | NaN | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | -0.824700 | 0 | 0 | 2657 | 0.841916 | 7.2292 | NaN | Cherbourg | 1 |
Transform column¶
- Transform "sex"
Column
by labeling female values with 1 and male with 0:
In [17]:
Copied!
titanic_slice.transform_column("sex", lambda row: 1 if row.get_value("sex") == "female" else 0)
titanic_slice.transform_column("sex", lambda row: 1 if row.get_value("sex") == "female" else 0)
Out[17]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | 0 | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | 0 | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | 0 | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 1 | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | 1 | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | 0 | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | 0 | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 1 | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | 0 | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 1 | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
- Transform values of "parents_children"
Column
into yes or no, depending on whether passenger has children:
In [18]:
Copied!
titanic_slice.transform_column("parents_children", lambda row: "No" if row.get_value("parents_children") == 0 else "Yes")
titanic_slice.transform_column("parents_children", lambda row: "No" if row.get_value("parents_children") == 0 else "Yes")
Out[18]:
id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | No | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | Yes | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | Yes | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | Yes | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | No | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | No | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | No | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | No | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | No | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | No | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |