Machine Learning¶

This tutorial explains how to train a machine learning model in Safe-DS and use it to make predictions.

Create a `TabularDataset`¶

First, we need to create a TabularDataset from the training data. TabularDatasets are used to train supervised machine learning models, because they keep track of the target column. A TabularDataset can be created from a Table by calling the to_tabular_dataset method:

In [1]:

Copied!





from safeds.data.tabular.containers import Table

training_set = Table(
    {
        "a": [3, 4, 8, 6, 5],
        "b": [2, 2, 1, 6, 3],
        "c": [1, 1, 1, 1, 1],
        "result": [6, 7, 10, 13, 9],
    },
)

tabular_dataset = training_set.to_tabular_dataset(
    "result",
)
from safeds.data.tabular.containers import Table

training_set = Table(
    {
        "a": [3, 4, 8, 6, 5],
        "b": [2, 2, 1, 6, 3],
        "c": [1, 1, 1, 1, 1],
        "result": [6, 7, 10, 13, 9],
    },
)

tabular_dataset = training_set.to_tabular_dataset(
    "result",
)

Create and train model¶

In this example, we want to predict the column result, which is the sum of a, b, and c. We will train a linear regression model with this training data. In Safe-DS, machine learning models are modeled as classes. First, their constructor must be called to configure hyperparameters, which returns a model object. Then, training is started by calling the fit method on the model object and passing the training data:

In [2]:

Copied!

from safeds.ml.classical.regression import LinearRegressor

model = LinearRegressor()
fitted_model = model.fit(tabular_dataset)
from safeds.ml.classical.regression import LinearRegressor

model = LinearRegressor()
fitted_model = model.fit(tabular_dataset)

Predicting new values¶

The fit method returns the fitted model, the original model is not changed. Predictions are made by calling the predict method on the fitted model. The predict method takes a Table as input and returns a Table with the predictions:

In [3]:

Copied!

test_set = Table({"a": [1, 1, 0, 2, 4], "b": [2, 0, 5, 2, 7], "c": [1, 4, 3, 2, 1]})

fitted_model.predict(dataset=test_set)
test_set = Table({"a": [1, 1, 0, 2, 4], "b": [2, 0, 5, 2, 7], "c": [1, 4, 3, 2, 1]})

fitted_model.predict(dataset=test_set)

Out[3]:

shape: (5, 4)

a	b	c	result
i64	i64	i64	f64
1	2	1	4.0
1	0	4	2.0
0	5	3	6.0
2	2	2	5.0
4	7	1	12.0

Metrics¶

A machine learning metric, also known as an evaluation metric, is a measure used to assess the performance of a machine learning model on a test set or is used during cross-validation to gain insights about performance and compare different models or parameter settings. In Safe-DS, the available metrics are: Accuracy, Confusion Matrix, F1-Score, Precision, and Recall. Before we go through each of these in detail, we need an understanding of the different components of evaluation metrics.

Components of evaluation metrics¶

These are distinct elements or parts that contribute to the overall assessment of an evaluation measure.

True positives TP: the positive tuples that the classifier correctly labeled.
False positives FP : the negative tuples that were falsely labeled as positive.
True negatives TN: the negative tuples that the classifier correctly labeled.
False negatives FN: the positive tuples that were falsely labeled as negative.

The confusion matrix is divided into four cells, representing different combinations of predicted and true labels:

The top-left cell represents the True Negatives. The value 1 in this cell indicates that one instance was correctly predicted as the negative class (0).
The top-right cell represents the False Positives. The value 1 in this cell indicates that one instance was incorrectly predicted as the positive class (1) while the true label is negative (0).
The bottom-left cell represents the False Negatives. The value 1 in this cell indicates that one instance was incorrectly predicted as the negative class (0) while the true label is positive (1).
The bottom-right cell represents the True Positives. The value 2 in this cell indicates that two instances were correctly predicted as the positive class (1).

Accuracy¶

Accuracy, also known as classification rate, can be defined as the proportion of correctly classified instances out of the total number of instances. That is, it provides a measure of how well a classification model performs overall.

Formula: $ Accuracy = \frac{TP+TN}{TP+FP+TN+FN} $

Let's consider the same dataset used for deriving the confusion matrix to get the accuracy:

In [4]:

Copied!

from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.accuracy(predicted, expected)
from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.accuracy(predicted, expected)

Out[4]:

0.6

The accuracy score 0.6 is calculated as the ratio of the number of correct predictions (3) to the total number of instances (5).

Accuracy is suitable when the classes are balanced and there is no significant class imbalance. However, accuracy alone may not always provide a complete picture of a model's performance, especially when dealing with imbalanced datasets. In such cases, other metrics like precision and recall may provide a more comprehensive evaluation of the model's performance.

F1-Score¶

The F1-Score is the harmonic mean of precision and recall. That is, it combines precision and recall into a single value.

Formula: $ F1-Score = \frac{2PR}{P+R} $

Let's consider the following dataset to get a better understanding:

In [5]:

Copied!

from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.f1_score(predicted, expected, positive_class=1)
from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.f1_score(predicted, expected, positive_class=1)

Out[5]:

0.6666666666666666

Out of the 5 instances in the dataset, 3 have been correctly predicted. However, there is one false positive (4th instance) and one false negative (5th instance). The harmonic mean of precision and recall in this dataset has an f1-score of approximately 0.67.

The F1-score is suitable when there is an imbalance between the classes, especially when the values of the false positives and false negatives differs.

Precision¶

The ability of a classification model to identify only the relevant data points. It measures the proportion of correctly predicted positive instances (true positives) out of all instances predicted as positive (both true positives and false positives).

Formula: $ P = \frac{TP}{TP+FP} $

Let's consider the following dataset to get a clearer picture of Precision:

In [6]:

Copied!

from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.precision(predicted, expected, positive_class=1)
from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.precision(predicted, expected, positive_class=1)

Out[6]:

0.6666666666666666

The classifier correctly predicts 2 true positive and 1 true negative. However, it also has one false positive, predicting a negative instance as positive. Using the above precision formular, we get a precision score of approximately 0.67. A precision score of 1 indicates that all positive predictions made by the model are correct, while a lower score suggests a higher proportion of false positives.

Precision is useful when the focus is on minimizing the negative tuples that were falsely labeled as positive.

Recall¶

Also known as sensitivity or true positive rate, is the ability of a classification model to identify all the relevant data points.

Formula: $ R = \frac{TP}{TP + FN} $

Considering the same dataset used so far, let's calculate the recall score to understand it better:

In [7]:

Copied!

from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.recall(predicted, expected, positive_class=1)
from safeds.data.tabular.containers import Column
from safeds.ml.metrics import ClassificationMetrics

predicted = Column("predicted", [0, 1, 1, 1, 0])
expected = Column("predicted", [0, 1, 1, 0, 1])

ClassificationMetrics.recall(predicted, expected, positive_class=1)

Out[7]:

0.6666666666666666

The classifier misses one positive instance, resulting in one false negative, where a positive instance is incorrectly classified as negative. Using the above recall formular, we get a recall score of approximately 0.67. A recall score of 1 indicates that all positive instances in the dataset are correctly identified by the model, while a lower score suggests a higher proportion of false negatives.

Recall is useful when the focus is on minimizing the positive tuples that were falsely labeled as negative.