Skip to content

TimeSeriesDataset

Bases: Dataset[Table, Column]

A time series dataset maps feature to a target column. It can be used to train machine learning models.

Data can be segmented into windows when loading it into the models.

Parameters:

Name Type Description Default
data Table | Mapping[str, Sequence[Any]]

The data.

required
target_name str

The name of the target column.

required
window_size int

The number of consecutive sample to use as input for prediction.

required
extra_names list[str] | None

Names of the columns that are neither features nor target. If None, no extra columns are used, i.e. all but the target column are used as features.

None
forecast_horizon int

The number of time steps to predict into the future.

1
continuous bool

Whether or not to continue the forecast in the steps before forecast horizon.

False

Raises:

Type Description
ColumnLengthMismatchError

If columns have different lengths.

ValueError

If the target column is also an extra column.

ValueError

If no feature column remains.

Examples:

>>> from safeds.data.labeled.containers import TabularDataset
>>> dataset = TimeSeriesDataset(
...     {"id": [1, 2, 3], "feature": [4, 5, 6], "target": [1, 2, 3], "error":[0,0,1]},
...     "target",
...     window_size=1,
...     extra_names=["error"],
... )

Methods:

Name Description
to_table

Return a new Table containing the feature columns, the target column and the extra columns.

Attributes:

Name Type Description
continuous bool

True if the time series will make a continuous prediction.

extras Table

Additional columns of the time series dataset that are neither features nor target.

features Table

The feature columns of the time series dataset.

forecast_horizon int

The number of time steps to predict into the future.

target Column

The target column of the time series dataset.

window_size int

The number of consecutive sample to use as input for prediction.

Source code in src/safeds/data/labeled/containers/_time_series_dataset.py
class TimeSeriesDataset(Dataset[Table, Column]):
    """
    A time series dataset maps feature to a target column. It can be used to train machine learning models.

    Data can be segmented into windows when loading it into the models.


    Parameters
    ----------
    data:
        The data.
    target_name:
        The name of the target column.
    window_size:
        The number of consecutive sample to use as input for prediction.
    extra_names:
        Names of the columns that are neither features nor target. If None, no extra columns are used, i.e. all but
        the target column are used as features.
    forecast_horizon:
        The number of time steps to predict into the future.
    continuous
        Whether or not to continue the forecast in the steps before forecast horizon.

    Raises
    ------
    ColumnLengthMismatchError
        If columns have different lengths.
    ValueError
        If the target column is also an extra column.
    ValueError
        If no feature column remains.

    Examples
    --------
    >>> from safeds.data.labeled.containers import TabularDataset
    >>> dataset = TimeSeriesDataset(
    ...     {"id": [1, 2, 3], "feature": [4, 5, 6], "target": [1, 2, 3], "error":[0,0,1]},
    ...     "target",
    ...     window_size=1,
    ...     extra_names=["error"],
    ... )
    """

    # ------------------------------------------------------------------------------------------------------------------
    # Dunder methods
    # ------------------------------------------------------------------------------------------------------------------
    def __init__(
        self,
        data: Table | Mapping[str, Sequence[Any]],
        target_name: str,
        window_size: int,
        *,
        extra_names: list[str] | None = None,
        forecast_horizon: int = 1,
        continuous: bool = False,
    ):
        from safeds.data.tabular.containers import Table

        # Preprocess inputs
        if not isinstance(data, Table):
            data = Table(data)
        if extra_names is None:
            extra_names = []

        # Derive feature names (build the set once, since comprehensions evaluate their condition every iteration)
        non_feature_names = {target_name, *extra_names}
        feature_names = [name for name in data.column_names if name not in non_feature_names]

        # Validate inputs
        if target_name in extra_names:
            raise ValueError(f"Column '{target_name}' cannot be both target and extra.")
        if len(feature_names) == 0:
            feature_names = []

        # Set attributes
        self._table: Table = data
        self._features: Table = data.select_columns(feature_names)
        self._target: Column = data.get_column(target_name)
        self._window_size: int = window_size
        self._forecast_horizon: int = forecast_horizon
        self._extras: Table = data.select_columns(extra_names)
        self._continuous: bool = continuous

    def __eq__(self, other: object) -> bool:
        """
        Compare two time series datasets.

        Returns
        -------
        equals:
            'True' if features, time, target and extras are equal, 'False' otherwise.
        """
        if not isinstance(other, TimeSeriesDataset):
            return NotImplemented
        return (self is other) or (
            self._window_size == other._window_size
            and self._forecast_horizon == other._forecast_horizon
            and self.target == other.target
            and self.features == other.features
            and self.extras == other.extras
        )

    def __hash__(self) -> int:
        """
        Return a deterministic hash value for this time series dataset.

        Returns
        -------
        hash:
            The hash value.
        """
        return _structural_hash(
            self.target,
            self.features,
            self.extras,
            self._window_size,
            self._forecast_horizon,
        )

    def __sizeof__(self) -> int:
        """
        Return the complete size of this object.

        Returns
        -------
        size:
            Size of this object in bytes.
        """
        return (
            sys.getsizeof(self._target)
            + sys.getsizeof(self._features)
            + sys.getsizeof(self.extras)
            + sys.getsizeof(self._window_size)
            + sys.getsizeof(self._forecast_horizon)
        )

    # ------------------------------------------------------------------------------------------------------------------
    # Properties
    # ------------------------------------------------------------------------------------------------------------------

    @property
    def features(self) -> Table:
        """The feature columns of the time series dataset."""
        return self._features

    @property
    def target(self) -> Column:
        """The target column of the time series dataset."""
        return self._target

    @property
    def window_size(self) -> int:
        """The number of consecutive sample to use as input for prediction."""
        return self._window_size

    @property
    def forecast_horizon(self) -> int:
        """The number of time steps to predict into the future."""
        return self._forecast_horizon

    @property
    def continuous(self) -> bool:
        """True if the time series will make a continuous prediction."""
        return self._continuous

    @property
    def extras(self) -> Table:
        """
        Additional columns of the time series dataset that are neither features nor target.

        These can be used to store additional information about instances, such as IDs.
        """
        return self._extras

    # ------------------------------------------------------------------------------------------------------------------
    # Conversion
    # ------------------------------------------------------------------------------------------------------------------

    def to_table(self) -> Table:
        """
        Return a new `Table` containing the feature columns, the target column and the extra columns.

        The original `TimeSeriesDataset` is not modified.

        Returns
        -------
        table:
            A table containing the feature columns, the target column and the extra columns.
        """
        return self._table

    def _into_dataloader_with_window(
        self,
        window_size: int,
        forecast_horizon: int,
        batch_size: int,
        continuous: bool = False,
    ) -> DataLoader:
        """
        Return a Dataloader for the data stored in this time series, used for training neural networks.

        It splits the target column into windows, uses them as feature and creates targets for the time series, by
        forecast length. The original time series dataset is not modified.

        Parameters
        ----------
        window_size:
            The size of the created windows
        forecast_horizon:
            The length of the forecast horizon, where all datapoints are collected until the given lag.
        batch_size:
            The size of data batches that should be loaded at one time.
        continuous:
            Whether to continue the forecast in the steps before forecast horizon.

        Raises
        ------
        OutOfBoundsError
            If window_size or forecast_horizon is below 1
        ValueError
            If the size is smaller or even than forecast_horizon + window_size

        Returns
        -------
        result:
            The DataLoader.
        """
        import torch
        from torch.utils.data import DataLoader

        _init_default_device()

        target_tensor = torch.tensor(self.target._series.to_numpy(), dtype=torch.float32)

        x_s = []
        y_s = []

        size = target_tensor.size(0)
        _check_bounds("window_size", window_size, lower_bound=_ClosedBound(1))
        _check_bounds("forecast_horizon", forecast_horizon, lower_bound=_ClosedBound(1))
        if size <= forecast_horizon + window_size:
            raise ValueError("Can not create windows with window size less then forecast horizon + window_size")
        # create feature windows and for that features targets lagged by forecast len
        # every feature column wird auch gewindowed
        # -> [i, win_size],[target]
        feature_cols = self.features.to_columns()
        for i in range(size - (forecast_horizon + window_size)):
            window = target_tensor[i : i + window_size]
            if continuous:
                label = target_tensor[i + window_size : i + window_size + forecast_horizon]

            else:
                label = target_tensor[i + window_size + forecast_horizon].unsqueeze(0)
            for col in feature_cols:
                data = torch.tensor(col._series.to_numpy(), dtype=torch.float32)
                window = torch.cat((window, data[i : i + window_size]), dim=0)
            x_s.append(window)
            y_s.append(label)
        x_s_tensor = torch.stack(x_s)
        y_s_tensor = torch.stack(y_s)
        dataset = _create_dataset(x_s_tensor, y_s_tensor)
        return DataLoader(dataset=dataset, batch_size=batch_size)

    def _into_dataloader_with_window_predict(
        self,
        window_size: int,
        forecast_horizon: int,
        batch_size: int,
    ) -> DataLoader:
        """
        Return a Dataloader for the data stored in this time series, used for training neural networks.

        It splits the target column into windows, uses them as feature and creates targets for the time series, by
        forecast length. The original time series dataset is not modified.

        Parameters
        ----------
        window_size:
            The size of the created windows
        batch_size:
            The size of data batches that should be loaded at one time.

        Raises
        ------
        OutOfBoundsError
            If window_size or forecast_horizon is below 1
        ValueError
            If the size is smaller or even than forecast_horizon + window_size

        Returns
        -------
        result:
            The DataLoader.
        """
        import torch
        from torch.utils.data import DataLoader

        _init_default_device()

        target_tensor = self.target._series.to_torch().to(_get_device())
        x_s = []

        size = target_tensor.size(0)
        _check_bounds("window_size", window_size, lower_bound=_ClosedBound(1))
        _check_bounds("forecast_horizon", forecast_horizon, lower_bound=_ClosedBound(1))
        if size <= forecast_horizon + window_size:
            raise ValueError("Can not create windows with window size less then forecast horizon + window_size")

        feature_cols = self.features.to_columns()
        for i in range(size - (forecast_horizon + window_size)):
            window = target_tensor[i : i + window_size]
            for col in feature_cols:
                data = torch.tensor(col._series.to_numpy(), dtype=torch.float32)
                window = torch.cat((window, data[i : i + window_size]), dim=-1)
            x_s.append(window)

        x_s_tensor = torch.stack(x_s)

        dataset = _create_dataset_predict(x_s_tensor)
        return DataLoader(dataset=dataset, batch_size=batch_size)

    # ------------------------------------------------------------------------------------------------------------------
    # IPython integration
    # ------------------------------------------------------------------------------------------------------------------

    def _repr_html_(self) -> str:
        """
        Return an HTML representation of the time series dataset.

        Returns
        -------
        output:
            The generated HTML.
        """
        return self._table._repr_html_()

continuous

True if the time series will make a continuous prediction.

extras

Additional columns of the time series dataset that are neither features nor target.

These can be used to store additional information about instances, such as IDs.

features

The feature columns of the time series dataset.

forecast_horizon

The number of time steps to predict into the future.

target

The target column of the time series dataset.

window_size

The number of consecutive sample to use as input for prediction.

to_table

Return a new Table containing the feature columns, the target column and the extra columns.

The original TimeSeriesDataset is not modified.

Returns:

Name Type Description
table Table

A table containing the feature columns, the target column and the extra columns.

Source code in src/safeds/data/labeled/containers/_time_series_dataset.py
def to_table(self) -> Table:
    """
    Return a new `Table` containing the feature columns, the target column and the extra columns.

    The original `TimeSeriesDataset` is not modified.

    Returns
    -------
    table:
        A table containing the feature columns, the target column and the extra columns.
    """
    return self._table