Data Processing¶
This tutorial explains how tabular data can be handled and transformed with the Table class.
Note
All operations on a Table return a new Table. The original Table will not be changed.
Create & Load data¶
- Load your data into a
Table:
In [1]:
Copied!
from safeds.data.tabular.containers import Table
titanic = Table.from_csv_file("data/titanic.csv")
from safeds.data.tabular.containers import Table
titanic = Table.from_csv_file("data/titanic.csv")
- Create a
Tablecontaining only the first 10 rows:
In [2]:
Copied!
titanic_slice = titanic.slice_rows(end=10)
titanic_slice # just to show the output
titanic_slice = titanic.slice_rows(end=10)
titanic_slice # just to show the output
Out[2]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
| 6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
- Extract a
Rowfrom yourTable:
In [3]:
Copied!
titanic_slice.get_row(0)
titanic_slice.get_row(0)
Out[3]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.55 | NaN | Southampton | 0 |
- Extract a
Columnfrom yourTable:
In [4]:
Copied!
titanic_slice.get_column("name")
titanic_slice.get_column("name")
Out[4]:
| name | |
|---|---|
| 0 | Abbing, Mr. Anthony |
| 1 | Abbott, Master. Eugene Joseph |
| 2 | Abbott, Mr. Rossmore Edward |
| 3 | Abbott, Mrs. Stanton (Rosa Hunt) |
| 4 | Abelseth, Miss. Karen Marie |
| 5 | Abelseth, Mr. Olaus Jorgensen |
| 6 | Abelson, Mr. Samuel |
| 7 | Abelson, Mrs. Samuel (Hannah Wizosky) |
| 8 | Abrahamsson, Mr. Abraham August Johannes |
| 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) |
- Combine a list of
Rows to aTable(make sure theRows have the same columns):
In [5]:
Copied!
Table.from_rows([
titanic_slice.get_row(0),
titanic_slice.get_row(1)
])
Table.from_rows([
titanic_slice.get_row(0),
titanic_slice.get_row(1)
])
Out[5]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.55 | NaN | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.25 | NaN | Southampton | 0 |
- Combine a list of
Columns to aTable(make sure theColumns have the same amount of rows):
In [6]:
Copied!
Table.from_columns([
titanic_slice.get_column("name"),
titanic_slice.get_column("age")
])
Table.from_columns([
titanic_slice.get_column("name"),
titanic_slice.get_column("age")
])
Out[6]:
| name | age | |
|---|---|---|
| 0 | Abbing, Mr. Anthony | 42.0 |
| 1 | Abbott, Master. Eugene Joseph | 13.0 |
| 2 | Abbott, Mr. Rossmore Edward | 16.0 |
| 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 35.0 |
| 4 | Abelseth, Miss. Karen Marie | 16.0 |
| 5 | Abelseth, Mr. Olaus Jorgensen | 25.0 |
| 6 | Abelson, Mr. Samuel | 30.0 |
| 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 28.0 |
| 8 | Abrahamsson, Mr. Abraham August Johannes | 20.0 |
| 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 18.0 |
- Drop columns from a
Table:
In [7]:
Copied!
titanic_slice.remove_columns([
"id",
"name",
"ticket",
"cabin",
"port_embarked",
"survived"
])
titanic_slice.remove_columns([
"id",
"name",
"ticket",
"cabin",
"port_embarked",
"survived"
])
Out[7]:
| sex | age | siblings_spouses | parents_children | travel_class | fare | |
|---|---|---|---|---|---|---|
| 0 | male | 42.0 | 0 | 0 | 3 | 7.5500 |
| 1 | male | 13.0 | 0 | 2 | 3 | 20.2500 |
| 2 | male | 16.0 | 1 | 1 | 3 | 20.2500 |
| 3 | female | 35.0 | 1 | 1 | 3 | 20.2500 |
| 4 | female | 16.0 | 0 | 0 | 3 | 7.6500 |
| 5 | male | 25.0 | 0 | 0 | 3 | 7.6500 |
| 6 | male | 30.0 | 1 | 0 | 2 | 24.0000 |
| 7 | female | 28.0 | 1 | 0 | 2 | 24.0000 |
| 8 | male | 20.0 | 0 | 0 | 3 | 7.9250 |
| 9 | female | 18.0 | 0 | 0 | 3 | 7.2292 |
- Keep only specified columns of a
Table:
In [8]:
Copied!
titanic_slice.keep_only_columns(["name", "survived"])
titanic_slice.keep_only_columns(["name", "survived"])
Out[8]:
| name | survived | |
|---|---|---|
| 0 | Abbing, Mr. Anthony | 0 |
| 1 | Abbott, Master. Eugene Joseph | 0 |
| 2 | Abbott, Mr. Rossmore Edward | 0 |
| 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 1 |
| 4 | Abelseth, Miss. Karen Marie | 1 |
| 5 | Abelseth, Mr. Olaus Jorgensen | 1 |
| 6 | Abelson, Mr. Samuel | 0 |
| 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 1 |
| 8 | Abrahamsson, Mr. Abraham August Johannes | 1 |
| 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 1 |
Process data¶
- Sort columns by their name:
In [9]:
Copied!
titanic_slice.sort_columns()
titanic_slice.sort_columns()
Out[9]:
| age | cabin | fare | id | name | parents_children | port_embarked | sex | siblings_spouses | survived | ticket | travel_class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42.0 | NaN | 7.5500 | 0 | Abbing, Mr. Anthony | 0 | Southampton | male | 0 | 0 | C.A. 5547 | 3 |
| 1 | 13.0 | NaN | 20.2500 | 1 | Abbott, Master. Eugene Joseph | 2 | Southampton | male | 0 | 0 | C.A. 2673 | 3 |
| 2 | 16.0 | NaN | 20.2500 | 2 | Abbott, Mr. Rossmore Edward | 1 | Southampton | male | 1 | 0 | C.A. 2673 | 3 |
| 3 | 35.0 | NaN | 20.2500 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 1 | Southampton | female | 1 | 1 | C.A. 2673 | 3 |
| 4 | 16.0 | NaN | 7.6500 | 4 | Abelseth, Miss. Karen Marie | 0 | Southampton | female | 0 | 1 | 348125 | 3 |
| 5 | 25.0 | F G63 | 7.6500 | 5 | Abelseth, Mr. Olaus Jorgensen | 0 | Southampton | male | 0 | 1 | 348122 | 3 |
| 6 | 30.0 | NaN | 24.0000 | 6 | Abelson, Mr. Samuel | 0 | Cherbourg | male | 1 | 0 | P/PP 3381 | 2 |
| 7 | 28.0 | NaN | 24.0000 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 0 | Cherbourg | female | 1 | 1 | P/PP 3381 | 2 |
| 8 | 20.0 | NaN | 7.9250 | 8 | Abrahamsson, Mr. Abraham August Johannes | 0 | Southampton | male | 0 | 1 | SOTON/O2 3101284 | 3 |
| 9 | 18.0 | NaN | 7.2292 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 0 | Cherbourg | female | 0 | 1 | 2657 | 3 |
- Sort columns with a custom comparator:
In [10]:
Copied!
titanic_slice.sort_columns(
lambda column1, column2:
(column1.name < column2.name) - (column1.name > column2.name)
)
titanic_slice.sort_columns(
lambda column1, column2:
(column1.name < column2.name) - (column1.name > column2.name)
)
Out[10]:
| travel_class | ticket | survived | siblings_spouses | sex | port_embarked | parents_children | name | id | fare | cabin | age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | C.A. 5547 | 0 | 0 | male | Southampton | 0 | Abbing, Mr. Anthony | 0 | 7.5500 | NaN | 42.0 |
| 1 | 3 | C.A. 2673 | 0 | 0 | male | Southampton | 2 | Abbott, Master. Eugene Joseph | 1 | 20.2500 | NaN | 13.0 |
| 2 | 3 | C.A. 2673 | 0 | 1 | male | Southampton | 1 | Abbott, Mr. Rossmore Edward | 2 | 20.2500 | NaN | 16.0 |
| 3 | 3 | C.A. 2673 | 1 | 1 | female | Southampton | 1 | Abbott, Mrs. Stanton (Rosa Hunt) | 3 | 20.2500 | NaN | 35.0 |
| 4 | 3 | 348125 | 1 | 0 | female | Southampton | 0 | Abelseth, Miss. Karen Marie | 4 | 7.6500 | NaN | 16.0 |
| 5 | 3 | 348122 | 1 | 0 | male | Southampton | 0 | Abelseth, Mr. Olaus Jorgensen | 5 | 7.6500 | F G63 | 25.0 |
| 6 | 2 | P/PP 3381 | 0 | 1 | male | Cherbourg | 0 | Abelson, Mr. Samuel | 6 | 24.0000 | NaN | 30.0 |
| 7 | 2 | P/PP 3381 | 1 | 1 | female | Cherbourg | 0 | Abelson, Mrs. Samuel (Hannah Wizosky) | 7 | 24.0000 | NaN | 28.0 |
| 8 | 3 | SOTON/O2 3101284 | 1 | 0 | male | Southampton | 0 | Abrahamsson, Mr. Abraham August Johannes | 8 | 7.9250 | NaN | 20.0 |
| 9 | 3 | 2657 | 1 | 0 | female | Cherbourg | 0 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 9 | 7.2292 | NaN | 18.0 |
- Filter rows with a given query:
In [11]:
Copied!
titanic.filter_rows(
lambda row:
"van" in row.get_value("name")
)
titanic.filter_rows(
lambda row:
"van" in row.get_value("name")
)
/tmp/ipykernel_3461/1773701666.py:1: DeprecationWarning: This method is deprecated and will be removed in a future version. Use `Table.keep_only_rows` instead. titanic.filter_rows(
Out[11]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 105 | Beavan, Mr. William Thomas | male | 19.0 | 0 | 0 | 323951 | 3 | 8.0500 | NaN | Southampton | 0 |
| 1 | 191 | Canavan, Miss. Mary | female | 21.0 | 0 | 0 | 364846 | 3 | 7.7500 | NaN | Queenstown | 0 |
| 2 | 192 | Canavan, Mr. Patrick | male | 21.0 | 0 | 0 | 364858 | 3 | 7.7500 | NaN | Queenstown | 0 |
| 3 | 267 | Cor, Mr. Ivan | male | 27.0 | 0 | 0 | 349229 | 3 | 7.8958 | NaN | Southampton | 0 |
| 4 | 308 | Davies, Mr. Evan | male | 22.0 | 0 | 0 | SC/A4 23568 | 3 | 8.0500 | NaN | Southampton | 0 |
| 5 | 333 | Devaney, Miss. Margaret Delia | female | 19.0 | 0 | 0 | 330958 | 3 | 7.8792 | NaN | Queenstown | 1 |
| 6 | 338 | Dimic, Mr. Jovan | male | 42.0 | 0 | 0 | 315088 | 3 | 8.6625 | NaN | Southampton | 0 |
| 7 | 383 | Evans, Miss. Edith Corse | female | 36.0 | 0 | 0 | PC 17531 | 1 | 31.6792 | A29 | Cherbourg | 0 |
| 8 | 586 | Ivanoff, Mr. Kanio | male | NaN | 0 | 0 | 349201 | 3 | 7.8958 | NaN | Southampton | 0 |
| 9 | 589 | Jalsevac, Mr. Ivan | male | 29.0 | 0 | 0 | 349240 | 3 | 7.8958 | NaN | Cherbourg | 1 |
| 10 | 811 | Mineff, Mr. Ivan | male | 24.0 | 0 | 0 | 349233 | 3 | 7.8958 | NaN | Southampton | 0 |
| 11 | 916 | O'Sullivan, Miss. Bridget Mary | female | NaN | 0 | 0 | 330909 | 3 | 7.6292 | NaN | Queenstown | 0 |
| 12 | 918 | Ovies y Rodriguez, Mr. Servando | male | 28.5 | 0 | 0 | PC 17562 | 1 | 27.7208 | D43 | Cherbourg | 0 |
| 13 | 990 | Rasmussen, Mrs. (Lena Jacobsen Solvang) | female | NaN | 0 | 0 | 65305 | 3 | 8.1125 | NaN | Southampton | 0 |
| 14 | 1143 | Staneff, Mr. Ivan | male | NaN | 0 | 0 | 349208 | 3 | 7.8958 | NaN | Southampton | 0 |
| 15 | 1144 | Stankovic, Mr. Ivan | male | 33.0 | 0 | 0 | 349239 | 3 | 8.6625 | NaN | Cherbourg | 0 |
| 16 | 1161 | Strilic, Mr. Ivan | male | 27.0 | 0 | 0 | 315083 | 3 | 8.6625 | NaN | Southampton | 0 |
| 17 | 1215 | van Billiard, Master. James William | male | NaN | 1 | 1 | A/5. 851 | 3 | 14.5000 | NaN | Southampton | 0 |
| 18 | 1216 | van Billiard, Master. Walter John | male | 11.5 | 1 | 1 | A/5. 851 | 3 | 14.5000 | NaN | Southampton | 0 |
| 19 | 1217 | van Billiard, Mr. Austin Blyler | male | 40.5 | 0 | 2 | A/5. 851 | 3 | 14.5000 | NaN | Southampton | 0 |
| 20 | 1222 | van Melkebeke, Mr. Philemon | male | NaN | 0 | 0 | 345777 | 3 | 9.5000 | NaN | Southampton | 0 |
Transform table¶
- Transform table using
Imputer.Imputers replace missing values with other values (e.g. a constant, the mean or the median of the column etc.) depending on the chosen startegy, for example, the followingImputerwill replace missing values in the given columns of the table with the constant 0:
In [12]:
Copied!
from safeds.data.tabular.transformation import Imputer
imputer = Imputer(Imputer.Strategy.Constant(0)).fit(titanic, ["age", "fare", "cabin", "port_embarked"])
imputer.transform(titanic_slice)
from safeds.data.tabular.transformation import Imputer
imputer = Imputer(Imputer.Strategy.Constant(0)).fit(titanic, ["age", "fare", "cabin", "port_embarked"])
imputer.transform(titanic_slice)
Out[12]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.55 | 0 | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.25 | 0 | Southampton | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.25 | 0 | Southampton | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.25 | 0 | Southampton | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | 0 | 348125 | 3 | 7.65 | 0 | Southampton | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | 0 | 348122 | 3 | 7.65 | F G63 | Southampton | 1 |
| 6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0 | 0 | Cherbourg | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0 | 0 | Cherbourg | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.925 | 0 | Southampton | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | 0 | Cherbourg | 1 |
- Transform table using
LabelEncoder, this will encode categorical features in the chosenColumns as integers:
In [13]:
Copied!
from safeds.data.tabular.transformation import LabelEncoder
encoder = LabelEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
from safeds.data.tabular.transformation import LabelEncoder
encoder = LabelEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
Out[13]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | 1.0 | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | 2.0 | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | 1.0 | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | 2.0 | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | 1.0 | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 2.0 | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 0.0 | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 2.0 | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | 0.0 | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | 2.0 | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | 1.0 | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | 2.0 | 1 |
| 6 | 6 | Abelson, Mr. Samuel | 1.0 | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 0.0 | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | 1.0 | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | 2.0 | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 0.0 | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | 0.0 | 1 |
- Transform table using
OneHotEncoder, this will create newColumns based on unique values in each chosenColumn:
In [14]:
Copied!
from safeds.data.tabular.transformation import OneHotEncoder
encoder = OneHotEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
from safeds.data.tabular.transformation import OneHotEncoder
encoder = OneHotEncoder().fit(titanic, ["sex", "port_embarked"])
encoder.transform(titanic_slice)
Out[14]:
| id | name | sex__male | sex__female | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked__Southampton | port_embarked__Cherbourg | port_embarked__Queenstown | port_embarked__nan | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | 1.0 | 0.0 | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | 1.0 | 0.0 | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | 1.0 | 0.0 | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 0.0 | 1.0 | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | 0.0 | 1.0 | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | 1.0 | 0.0 | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
| 6 | 6 | Abelson, Mr. Samuel | 1.0 | 0.0 | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 1.0 | 0.0 | 0.0 | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 0.0 | 1.0 | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | 0.0 | 1.0 | 0.0 | 0.0 | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | 1.0 | 0.0 | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | 1.0 | 0.0 | 0.0 | 0.0 | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 0.0 | 1.0 | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | 0.0 | 1.0 | 0.0 | 0.0 | 1 |
- Transform table using
RangeScaler, this will scale the values in the chosenColumns to a given range:
In [15]:
Copied!
from safeds.data.tabular.transformation import RangeScaler
scaler = RangeScaler(0.0, 1.0).fit(titanic, ["age"])
scaler.transform(titanic_slice)
from safeds.data.tabular.transformation import RangeScaler
scaler = RangeScaler(0.0, 1.0).fit(titanic, ["age"])
scaler.transform(titanic_slice)
Out[15]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 0.524008 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | male | 0.160751 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | male | 0.198330 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 0.436325 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | female | 0.198330 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 0.311064 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
| 6 | 6 | Abelson, Mr. Samuel | male | 0.373695 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 0.348643 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 0.248434 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 0.223382 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
- Transform table using
StandardScaler, this will standardize values of chosenColumns:
In [16]:
Copied!
from safeds.data.tabular.transformation import StandardScaler
scaler = StandardScaler().fit(titanic, ["age", "travel_class"])
scaler.transform(titanic_slice)
from safeds.data.tabular.transformation import StandardScaler
scaler = StandardScaler().fit(titanic, ["age", "travel_class"])
scaler.transform(titanic_slice)
Out[16]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 0.841202 | 0 | 0 | C.A. 5547 | 0.841916 | 7.5500 | NaN | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | male | -1.171763 | 0 | 2 | C.A. 2673 | 0.841916 | 20.2500 | NaN | Southampton | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | male | -0.963526 | 1 | 1 | C.A. 2673 | 0.841916 | 20.2500 | NaN | Southampton | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 0.355314 | 1 | 1 | C.A. 2673 | 0.841916 | 20.2500 | NaN | Southampton | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | female | -0.963526 | 0 | 0 | 348125 | 0.841916 | 7.6500 | NaN | Southampton | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | -0.338812 | 0 | 0 | 348122 | 0.841916 | 7.6500 | F G63 | Southampton | 1 |
| 6 | 6 | Abelson, Mr. Samuel | male | 0.008251 | 1 | 0 | P/PP 3381 | -0.352091 | 24.0000 | NaN | Cherbourg | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | -0.130574 | 1 | 0 | P/PP 3381 | -0.352091 | 24.0000 | NaN | Cherbourg | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | -0.685875 | 0 | 0 | SOTON/O2 3101284 | 0.841916 | 7.9250 | NaN | Southampton | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | -0.824700 | 0 | 0 | 2657 | 0.841916 | 7.2292 | NaN | Cherbourg | 1 |
Transform column¶
- Transform "sex"
Columnby labeling female values with 1 and male with 0:
In [17]:
Copied!
titanic_slice.transform_column("sex", lambda row: 1 if row.get_value("sex") == "female" else 0)
titanic_slice.transform_column("sex", lambda row: 1 if row.get_value("sex") == "female" else 0)
Out[17]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | 0 | 42.0 | 0 | 0 | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | 0 | 13.0 | 0 | 2 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | 0 | 16.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | 1 | 35.0 | 1 | 1 | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | 1 | 16.0 | 0 | 0 | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | 0 | 25.0 | 0 | 0 | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
| 6 | 6 | Abelson, Mr. Samuel | 0 | 30.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | 1 | 28.0 | 1 | 0 | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | 0 | 20.0 | 0 | 0 | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | 1 | 18.0 | 0 | 0 | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |
- Transform values of "parents_children"
Columninto yes or no, depending on whether passenger has children:
In [18]:
Copied!
titanic_slice.transform_column("parents_children", lambda row: "No" if row.get_value("parents_children") == 0 else "Yes")
titanic_slice.transform_column("parents_children", lambda row: "No" if row.get_value("parents_children") == 0 else "Yes")
Out[18]:
| id | name | sex | age | siblings_spouses | parents_children | ticket | travel_class | fare | cabin | port_embarked | survived | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | Abbing, Mr. Anthony | male | 42.0 | 0 | No | C.A. 5547 | 3 | 7.5500 | NaN | Southampton | 0 |
| 1 | 1 | Abbott, Master. Eugene Joseph | male | 13.0 | 0 | Yes | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 2 | 2 | Abbott, Mr. Rossmore Edward | male | 16.0 | 1 | Yes | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 0 |
| 3 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | Yes | C.A. 2673 | 3 | 20.2500 | NaN | Southampton | 1 |
| 4 | 4 | Abelseth, Miss. Karen Marie | female | 16.0 | 0 | No | 348125 | 3 | 7.6500 | NaN | Southampton | 1 |
| 5 | 5 | Abelseth, Mr. Olaus Jorgensen | male | 25.0 | 0 | No | 348122 | 3 | 7.6500 | F G63 | Southampton | 1 |
| 6 | 6 | Abelson, Mr. Samuel | male | 30.0 | 1 | No | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 0 |
| 7 | 7 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | No | P/PP 3381 | 2 | 24.0000 | NaN | Cherbourg | 1 |
| 8 | 8 | Abrahamsson, Mr. Abraham August Johannes | male | 20.0 | 0 | No | SOTON/O2 3101284 | 3 | 7.9250 | NaN | Southampton | 1 |
| 9 | 9 | Abrahim, Mrs. Joseph (Sophie Halaut Easu) | female | 18.0 | 0 | No | 2657 | 3 | 7.2292 | NaN | Cherbourg | 1 |