AI for Time Series and Forecasting
How to read this page: This article maps the topic from beginner to expert across six levels � Remembering, Understanding, Applying, Analyzing, Evaluating, and Creating. Scan the headings to see the full scope, then read from wherever your knowledge starts to feel uncertain. Learn more about how BloomWiki works ?
AI for time series and forecasting applies machine learning and deep learning techniques to sequential, time-indexed data to predict future values, detect anomalies, and extract patterns. Time series data is ubiquitous: stock prices, electricity demand, web traffic, sensor readings, weather measurements, and patient vital signs all evolve over time. Traditional forecasting relied on statistical models like ARIMA; modern AI-driven approaches — including LSTMs, Temporal Fusion Transformers, and foundation models for time series — now achieve state-of-the-art performance across domains.
Remembering[edit]
- Time series — A sequence of data points indexed in time order, typically at regular intervals.
- Forecasting — Predicting future values of a time series based on its historical patterns.
- Univariate time series — A single variable measured over time (e.g., daily sales).
- Multivariate time series — Multiple variables measured simultaneously over time (e.g., temperature, humidity, and pressure together).
- Trend — The long-term direction of a time series (upward, downward, or flat).
- Seasonality — Regular, periodic patterns that repeat at known intervals (daily, weekly, yearly).
- Residuals — The component remaining after removing trend and seasonality; ideally random noise.
- Stationarity — A time series is stationary if its statistical properties (mean, variance) do not change over time. Many models require stationarity.
- Autocorrelation — The correlation of a time series with its own past values (lags).
- Lag — A prior time step. Lag-1 is yesterday's value; lag-7 is last week's value.
- ARIMA — AutoRegressive Integrated Moving Average; a classical statistical model for univariate forecasting.
- LSTM (Long Short-Term Memory) — A type of RNN with gating mechanisms that captures long-range dependencies in sequences.
- Temporal Fusion Transformer (TFT) — A transformer-based model for multi-horizon time series forecasting, incorporating attention across time.
- Anomaly detection — Identifying data points, intervals, or patterns that deviate significantly from expected behavior.
- Horizon — The number of future time steps to forecast (1-step-ahead vs. multi-step/multi-horizon).
- Rolling forecast — Re-fitting or updating the model as new data arrives, maintaining accuracy over time.
Understanding[edit]
Time series forecasting is inherently a sequential problem: the order of observations matters, and the past contains information about the future. This distinguishes it from tabular classification, where rows are exchangeable.
The decomposition framework is key to understanding time series: <syntaxhighlight lang="text"> Observed = Trend × Seasonal × Residual (multiplicative)
= Trend + Seasonal + Residual (additive)
</syntaxhighlight> Decomposing a series into these components enables targeted modeling: model the trend with regression, the seasonality with Fourier features or indicator variables, and the residual with a neural network or ARIMA.
Why deep learning? Classical models like ARIMA excel at capturing simple autocorrelation but struggle with:
- Non-linear relationships between variables
- Multiple interacting series (multivariate)
- Complex, multi-scale seasonality
- Incorporating exogenous variables (weather, holidays, promotions)
LSTMs can capture non-linear temporal dependencies and handle arbitrary-length sequences. Transformers add the ability to attend to any past time step directly, avoiding the vanishing gradient problem over long sequences. Foundation models for time series (TimeGPT, MOIRAI, Chronos) pre-trained on billions of time points can zero-shot forecast on new series.
Evaluation discipline: A critical mistake in time series is using random train/test splits. This causes data leakage — future data leaks into the training set. Always use chronological splits: train on the first 70–80%, validate on the next 10–15%, test on the most recent 10–15%.
Applying[edit]
Multi-horizon forecasting with Temporal Fusion Transformer (PyTorch Forecasting):
<syntaxhighlight lang="python"> import pandas as pd from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer from pytorch_forecasting.metrics import QuantileLoss import lightning.pytorch as pl
- Load data: each row = one time step for one series
df = pd.read_csv("sales_data.csv") df["time_idx"] = (df["date"] - df["date"].min()).dt.days # integer time index
max_encoder_length = 60 # Use 60 past days as context max_prediction_length = 14 # Forecast 14 days ahead
- Training dataset
training = TimeSeriesDataSet(
df[lambda x: x.time_idx <= x.time_idx.max() - max_prediction_length], time_idx="time_idx", target="sales", group_ids=["store_id", "product_id"], # Multiple series min_encoder_length=30, max_encoder_length=max_encoder_length, max_prediction_length=max_prediction_length, static_categoricals=["store_id", "product_id"], time_varying_known_reals=["time_idx", "price", "day_of_week", "is_holiday"], time_varying_unknown_reals=["sales"], # Only target is "unknown" in future target_normalizer="auto",
)
validation = TimeSeriesDataSet.from_dataset(training, df, predict=True, stop_randomization=True) train_dl = training.to_dataloader(train=True, batch_size=64) val_dl = validation.to_dataloader(train=False, batch_size=64)
- TFT model
tft = TemporalFusionTransformer.from_dataset(
training, learning_rate=0.03, hidden_size=64, attention_head_size=4, dropout=0.1, hidden_continuous_size=32, output_size=7, # 7 quantile predictions (p10 to p90) loss=QuantileLoss(), log_interval=10,
)
trainer = pl.Trainer(max_epochs=30, accelerator="gpu", gradient_clip_val=0.1) trainer.fit(tft, train_dl, val_dl) </syntaxhighlight>
- Model selection guide by forecasting scenario
- Simple univariate, clean seasonality → SARIMA, Prophet (Meta), ETS
- Univariate with complex patterns → N-BEATS, N-HiTS, PatchTST
- Multivariate with known future covariates → Temporal Fusion Transformer, DeepAR
- Very short series or irregular intervals → Gaussian Processes, ARIMA
- Many series, zero-shot → TimeGPT, Chronos, MOIRAI (foundation models)
- Anomaly detection → Isolation Forest (tabular features), LSTMAD, Anomaly Transformer
Analyzing[edit]
| Model | Type | Strengths | Weaknesses |
|---|---|---|---|
| ARIMA/SARIMA | Statistical | Interpretable, fast, works on small data | Assumes linearity, one series at a time |
| Prophet | Statistical | Handles holidays, trend changepoints | Limited to single series; no covariates |
| DeepAR | Deep Learning (LSTM) | Probabilistic, many series | Needs lots of data, slow training |
| TFT | Transformer | Multi-horizon, covariate-rich, interpretable | Complex, high data requirement |
| N-BEATS | Deep Learning (MLP) | Fast, competitive, no feature engineering | Limited covariate support |
| Chronos (foundation) | LLM-style | Zero-shot, no training needed | No covariate support yet; large model |
Failure modes:
- Chronological leakage — Random train/test splits allow future data to inform past predictions, producing falsely optimistic results. Always split chronologically.
- Ignoring non-stationarity — Many models assume stationarity. Differencing (ARIMA) or normalization per-series is required.
- Ignoring distributional shift — Retail models trained pre-COVID performed terribly during COVID. Extreme events cause structural breaks that no model trained on historical data anticipates.
- Point forecast overconfidence — Reporting only mean forecasts without uncertainty intervals. Downstream planning needs to understand the range of outcomes, not just the median.
- Evaluation on last segment only — Evaluating only on the final test period may not represent the model's general quality. Use rolling window backtesting across multiple historical windows.
Evaluating[edit]
Expert time series evaluation uses multiple metrics and rigorous experimental design:
Regression metrics: MAE (Mean Absolute Error), RMSE, MAPE (Mean Absolute Percentage Error), sMAPE. MAPE is undefined when actual=0 and is skewed by near-zero values; sMAPE or MAE are more robust.
Probabilistic metrics: For probabilistic forecasts (quantile or interval), use CRPS (Continuous Ranked Probability Score) or Winkler score. These reward well-calibrated uncertainty.
Rolling window backtesting: Instead of one train/test split, slide a window across history — train on windows [0:T], [0:T+1], … and evaluate on each subsequent step. This tests the model across many historical regimes and avoids cherry-picking a favorable test period.
Naive benchmarks: Always compare to: naive (last value), seasonal naive (same period last cycle), and exponential smoothing. If a complex deep learning model cannot beat seasonal naive, it's not adding value.
Expert practitioners report backtesting results as distributions (mean ± std across windows) rather than a single number, and explicitly test for robustness during unusual periods (holidays, pandemics, market crashes).
Creating[edit]
Designing a production time series forecasting system:
1. Data architecture <syntaxhighlight lang="text"> Raw time series sources (databases, IoT, APIs)
↓
[Time-indexed storage: InfluxDB, TimescaleDB, or Parquet partitioned by date]
↓
[Feature engineering pipeline:] │ ├── Temporal features: hour, day of week, month, quarter │ ├── Lag features: lag-1, lag-7, lag-28, rolling mean/std │ ├── Fourier features for seasonality │ └── External covariates: weather, holidays, promotions
↓
[Stationarity tests + differencing if needed]
↓
[Train/val/test split: chronological] </syntaxhighlight>
2. Model training and selection <syntaxhighlight lang="text"> Train multiple models (baseline naive, SARIMA, TFT, foundation model)
↓
Evaluate each with rolling window backtesting
↓
Select winner by held-out test MAPE and CRPS
↓
Train ensemble: weighted average of top-3 models (often beats any single model)
↓
Register in model registry with evaluation metrics </syntaxhighlight>
3. Production serving and retraining
- Serve forecasts via API with caching (same-day forecast is rarely regenerated)
- Nightly retrain on latest data window (rolling retrain strategy)
- Monitor forecast accuracy vs. actuals in real-time; alert on anomalies
- Detect distribution shift: plot forecast distribution vs. actuals weekly
- Trigger manual review when MAE exceeds historical 95th percentile