TimeSeriesDataset

Bases: Dataset[Table, Column]

A time series dataset maps feature to a target column. It can be used to train machine learning models.

Data can be segmented into windows when loading it into the models.

Parameters:

Name	Type	Description	Default
`data`	`Table \| Mapping[str, Sequence[Any]]`	The data.	required
`target_name`	`str`	The name of the target column.	required
`window_size`	`int`	The number of consecutive sample to use as input for prediction.	required
`extra_names`	`list[str] \| None`	Names of the columns that are neither features nor target. If None, no extra columns are used, i.e. all but the target column are used as features.	`None`
`forecast_horizon`	`int`	The number of time steps to predict into the future.	`1`
`continuous`	`bool`	Whether or not to continue the forecast in the steps before forecast horizon.	`False`

Raises:

Type	Description
`ColumnLengthMismatchError`	If columns have different lengths.
`ValueError`	If the target column is also an extra column.
`ValueError`	If no feature column remains.

Examples:

>>> from safeds.data.labeled.containers import TabularDataset
>>> dataset = TimeSeriesDataset(
...     {"id": [1, 2, 3], "feature": [4, 5, 6], "target": [1, 2, 3], "error":[0,0,1]},
...     "target",
...     window_size=1,
...     extra_names=["error"],
... )

Methods:

Name	Description
`to_table`	Return a new `Table` containing the feature columns, the target column and the extra columns.

Attributes:

Name	Type	Description
`continuous`	`bool`	True if the time series will make a continuous prediction.
`extras`	`Table`	Additional columns of the time series dataset that are neither features nor target.
`features`	`Table`	The feature columns of the time series dataset.
`forecast_horizon`	`int`	The number of time steps to predict into the future.
`target`	`Column`	The target column of the time series dataset.
`window_size`	`int`	The number of consecutive sample to use as input for prediction.

Source code in src/safeds/data/labeled/containers/_time_series_dataset.py

class TimeSeriesDataset(Dataset[Table, Column]):
    """
    A time series dataset maps feature to a target column. It can be used to train machine learning models.

    Data can be segmented into windows when loading it into the models.


    Parameters
    ----------
    data:
        The data.
    target_name:
        The name of the target column.
    window_size:
        The number of consecutive sample to use as input for prediction.
    extra_names:
        Names of the columns that are neither features nor target. If None, no extra columns are used, i.e. all but
        the target column are used as features.
    forecast_horizon:
        The number of time steps to predict into the future.
    continuous
        Whether or not to continue the forecast in the steps before forecast horizon.

    Raises
    ------
    ColumnLengthMismatchError
        If columns have different lengths.
    ValueError
        If the target column is also an extra column.
    ValueError
        If no feature column remains.

    Examples
    --------
    >>> from safeds.data.labeled.containers import TabularDataset
    >>> dataset = TimeSeriesDataset(
    ...     {"id": [1, 2, 3], "feature": [4, 5, 6], "target": [1, 2, 3], "error":[0,0,1]},
    ...     "target",
    ...     window_size=1,
    ...     extra_names=["error"],
    ... )
    """

    # ------------------------------------------------------------------------------------------------------------------
    # Dunder methods
    # ------------------------------------------------------------------------------------------------------------------
    def __init__(
        self,
        data: Table | Mapping[str, Sequence[Any]],
        target_name: str,
        window_size: int,
        *,
        extra_names: list[str] | None = None,
        forecast_horizon: int = 1,
        continuous: bool = False,
    ):
        from safeds.data.tabular.containers import Table

        # Preprocess inputs
        if not isinstance(data, Table):
            data = Table(data)
        if extra_names is None:
            extra_names = []

        # Derive feature names (build the set once, since comprehensions evaluate their condition every iteration)
        non_feature_names = {target_name, *extra_names}
        feature_names = [name for name in data.column_names if name not in non_feature_names]

        # Validate inputs
        if target_name in extra_names:
            raise ValueError(f"Column '{target_name}' cannot be both target and extra.")
        if len(feature_names) == 0:
            feature_names = []

        # Set attributes
        self._table: Table = data
        self._features: Table = data.select_columns(feature_names)
        self._target: Column = data.get_column(target_name)
        self._window_size: int = window_size
        self._forecast_horizon: int = forecast_horizon
        self._extras: Table = data.select_columns(extra_names)
        self._continuous: bool = continuous

    def __eq__(self, other: object) -> bool:
        """
        Compare two time series datasets.

        Returns
        -------
        equals:
            'True' if features, time, target and extras are equal, 'False' otherwise.
        """
        if not isinstance(other, TimeSeriesDataset):
            return NotImplemented
        return (self is other) or (
            self._window_size == other._window_size
            and self._forecast_horizon == other._forecast_horizon
            and self.target == other.target
            and self.features == other.features
            and self.extras == other.extras
        )

    def __hash__(self) -> int:
        """
        Return a deterministic hash value for this time series dataset.

        Returns
        -------
        hash:
            The hash value.
        """
        return _structural_hash(
            self.target,
            self.features,
            self.extras,
            self._window_size,
            self._forecast_horizon,
        )

    def __sizeof__(self) -> int:
        """
        Return the complete size of this object.

        Returns
        -------
        size:
            Size of this object in bytes.
        """
        return (
            sys.getsizeof(self._target)
            + sys.getsizeof(self._features)
            + sys.getsizeof(self.extras)
            + sys.getsizeof(self._window_size)
            + sys.getsizeof(self._forecast_horizon)
        )

    # ------------------------------------------------------------------------------------------------------------------
    # Properties
    # ------------------------------------------------------------------------------------------------------------------

    @property
    def features(self) -> Table:
        """The feature columns of the time series dataset."""
        return self._features

    @property
    def target(self) -> Column:
        """The target column of the time series dataset."""
        return self._target

    @property
    def window_size(self) -> int:
        """The number of consecutive sample to use as input for prediction."""
        return self._window_size

    @property
    def forecast_horizon(self) -> int:
        """The number of time steps to predict into the future."""
        return self._forecast_horizon

    @property
    def continuous(self) -> bool:
        """True if the time series will make a continuous prediction."""
        return self._continuous

    @property
    def extras(self) -> Table:
        """
        Additional columns of the time series dataset that are neither features nor target.

        These can be used to store additional information about instances, such as IDs.
        """
        return self._extras

    # ------------------------------------------------------------------------------------------------------------------
    # Conversion
    # ------------------------------------------------------------------------------------------------------------------

    def to_table(self) -> Table:
        """
        Return a new `Table` containing the feature columns, the target column and the extra columns.

        The original `TimeSeriesDataset` is not modified.

        Returns
        -------
        table:
            A table containing the feature columns, the target column and the extra columns.
        """
        return self._table

    def _into_dataloader_with_window(
        self,
        window_size: int,
        forecast_horizon: int,
        batch_size: int,
        continuous: bool = False,
    ) -> DataLoader:
        """
        Return a Dataloader for the data stored in this time series, used for training neural networks.

        It splits the target column into windows, uses them as feature and creates targets for the time series, by
        forecast length. The original time series dataset is not modified.

        Parameters
        ----------
        window_size:
            The size of the created windows
        forecast_horizon:
            The length of the forecast horizon, where all datapoints are collected until the given lag.
        batch_size:
            The size of data batches that should be loaded at one time.
        continuous:
            Whether to continue the forecast in the steps before forecast horizon.

        Raises
        ------
        OutOfBoundsError
            If window_size or forecast_horizon is below 1
        ValueError
            If the size is smaller or even than forecast_horizon + window_size

        Returns
        -------
        result:
            The DataLoader.
        """
        import torch
        from torch.utils.data import DataLoader

        _init_default_device()

        target_tensor = torch.tensor(self.target._series.to_numpy(), dtype=torch.float32)

        x_s = []
        y_s = []

        size = target_tensor.size(0)
        _check_bounds("window_size", window_size, lower_bound=_ClosedBound(1))
        _check_bounds("forecast_horizon", forecast_horizon, lower_bound=_ClosedBound(1))
        if size <= forecast_horizon + window_size:
            raise ValueError("Can not create windows with window size less then forecast horizon + window_size")
        # create feature windows and for that features targets lagged by forecast len
        # every feature column wird auch gewindowed
        # -> [i, win_size],[target]
        feature_cols = self.features.to_columns()
        for i in range(size - (forecast_horizon + window_size)):
            window = target_tensor[i : i + window_size]
            if continuous:
                label = target_tensor[i + window_size : i + window_size + forecast_horizon]

            else:
                label = target_tensor[i + window_size + forecast_horizon].unsqueeze(0)
            for col in feature_cols:
                data = torch.tensor(col._series.to_numpy(), dtype=torch.float32)
                window = torch.cat((window, data[i : i + window_size]), dim=0)
            x_s.append(window)
            y_s.append(label)
        x_s_tensor = torch.stack(x_s)
        y_s_tensor = torch.stack(y_s)
        dataset = _create_dataset(x_s_tensor, y_s_tensor)
        return DataLoader(dataset=dataset, batch_size=batch_size)

    def _into_dataloader_with_window_predict(
        self,
        window_size: int,
        forecast_horizon: int,
        batch_size: int,
    ) -> DataLoader:
        """
        Return a Dataloader for the data stored in this time series, used for training neural networks.

        It splits the target column into windows, uses them as feature and creates targets for the time series, by
        forecast length. The original time series dataset is not modified.

        Parameters
        ----------
        window_size:
            The size of the created windows
        batch_size:
            The size of data batches that should be loaded at one time.

        Raises
        ------
        OutOfBoundsError
            If window_size or forecast_horizon is below 1
        ValueError
            If the size is smaller or even than forecast_horizon + window_size

        Returns
        -------
        result:
            The DataLoader.
        """
        import torch
        from torch.utils.data import DataLoader

        _init_default_device()

        target_tensor = self.target._series.to_torch().to(_get_device())
        x_s = []

        size = target_tensor.size(0)
        _check_bounds("window_size", window_size, lower_bound=_ClosedBound(1))
        _check_bounds("forecast_horizon", forecast_horizon, lower_bound=_ClosedBound(1))
        if size <= forecast_horizon + window_size:
            raise ValueError("Can not create windows with window size less then forecast horizon + window_size")

        feature_cols = self.features.to_columns()
        for i in range(size - (forecast_horizon + window_size)):
            window = target_tensor[i : i + window_size]
            for col in feature_cols:
                data = torch.tensor(col._series.to_numpy(), dtype=torch.float32)
                window = torch.cat((window, data[i : i + window_size]), dim=-1)
            x_s.append(window)

        x_s_tensor = torch.stack(x_s)

        dataset = _create_dataset_predict(x_s_tensor)
        return DataLoader(dataset=dataset, batch_size=batch_size)

    # ------------------------------------------------------------------------------------------------------------------
    # IPython integration
    # ------------------------------------------------------------------------------------------------------------------

    def _repr_html_(self) -> str:
        """
        Return an HTML representation of the time series dataset.

        Returns
        -------
        output:
            The generated HTML.
        """
        return self._table._repr_html_()

`continuous` ¶

True if the time series will make a continuous prediction.

`extras` ¶

Additional columns of the time series dataset that are neither features nor target.

These can be used to store additional information about instances, such as IDs.

`features` ¶

The feature columns of the time series dataset.

`forecast_horizon` ¶

The number of time steps to predict into the future.

`target` ¶

The target column of the time series dataset.

`window_size` ¶

The number of consecutive sample to use as input for prediction.

`to_table` ¶

Return a new Table containing the feature columns, the target column and the extra columns.

The original TimeSeriesDataset is not modified.

Returns:

Name	Type	Description
`table`	`Table`	A table containing the feature columns, the target column and the extra columns.

Source code in src/safeds/data/labeled/containers/_time_series_dataset.py

def to_table(self) -> Table:
    """
    Return a new `Table` containing the feature columns, the target column and the extra columns.

    The original `TimeSeriesDataset` is not modified.

    Returns
    -------
    table:
        A table containing the feature columns, the target column and the extra columns.
    """
    return self._table

TimeSeriesDataset

continuous ¶

extras ¶

features ¶

forecast_horizon ¶

target ¶

window_size ¶

to_table ¶