Complete Code Explanation: sktime-mcp

📋 Table of Contents

Project Overview
Architecture
File-by-File Breakdown
How It All Works Together
Key Concepts

Project Overview

sktime-mcp is a Model Context Protocol (MCP) server that exposes the sktime time series library to Large Language Models (LLMs). It allows LLMs to:

Discover time series estimators from sktime’s registry
Reason about their capabilities using tags
Compose estimators into pipelines
Execute real forecasting workflows on datasets

What Problem Does It Solve?

LLMs can’t directly interact with Python libraries. This MCP server acts as a semantic bridge, translating between:

LLM world: JSON-RPC requests with simple arguments
Python world: Complex object instantiation, method calls, and data manipulation

Architecture

The codebase is organized into 5 main layers:

┌─────────────────────────────────────────┐
│         MCP Server (server.py)          │  ← Entry point, handles JSON-RPC
├─────────────────────────────────────────┤
│         Tools Layer (tools/)            │  ← MCP tool implementations
├─────────────────────────────────────────┤
│  Registry (registry/)  │  Composition   │  ← Discovery & Validation
│                        │  (composition/)│
├─────────────────────────────────────────┤
│         Runtime (runtime/)              │  ← Execution & Handle Management
├─────────────────────────────────────────┤
│            sktime Library               │  ← Actual ML library
└─────────────────────────────────────────┘

File-by-File Breakdown

📁 Root Level Files

`README.md`

Purpose: Project documentation and quick start guide
Key Sections:
- Installation instructions
- Available MCP tools overview
- Example LLM workflow
- Project structure

`pyproject.toml`

Purpose: Python project configuration (PEP 518)
Key Contents:
- Package metadata (name, version, description)
- Dependencies: mcp, sktime, pandas, numpy, scikit-learn
- Optional dependencies for dev and extended features
- Entry point: sktime-mcp command → sktime_mcp.server:main
- Tool configurations (ruff, pytest)

📁 `src/sktime_mcp/` - Core Source Code

`server.py` - MCP Server Entry Point

Purpose: Main MCP server that handles all tool calls

Key Components:

sanitize_for_json(obj): Converts Python objects to JSON-serializable format
- Handles numpy arrays, pandas objects, special types
@server.list_tools(): Registers all available MCP tools
- Returns tool schemas (name, description, input schema)
- Tools span Discovery, Instantiation, Execution, Data, Export, Persistence, Validation, and Job Management. (e.g., list_estimators, instantiate_pipeline, fit_predict_async, load_data_source, save_model, check_job_status).
@server.call_tool(name, arguments): Routes tool calls to implementations
- Validates arguments
- Calls appropriate tool function
- Sanitizes and returns results
main(): Entry point that starts the MCP server
- Uses stdio transport (reads from stdin, writes to stdout)
- Compatible with Claude Desktop and other MCP clients

Flow:

LLM → JSON-RPC request → server.call_tool() → tool function → sanitize → JSON response → LLM

📁 `src/sktime_mcp/registry/` - Estimator Discovery

`interface.py` - Registry Interface

Purpose: Wraps sktime’s all_estimators() function and provides structured access

Key Classes:

EstimatorNode (dataclass)
- Represents a single estimator with all its metadata
- Fields:
  - name: Class name (e.g., “ARIMA”)
  - task: Task type (e.g., “forecasting”)
  - module: Python module path
  - class_ref: Actual Python class
  - tags: Capability tags (e.g., {"capability:pred_int": True})
  - hyperparameters: Constructor parameters with defaults
  - docstring: Class documentation
- Methods:
  - to_dict(): JSON serialization
  - to_summary(): Minimal info for list operations
RegistryInterface (singleton)
- Purpose: Lazy-loads and caches all sktime estimators
- Key Methods:
  - get_all_estimators(task, tags): Filter estimators by task and tags
  - get_estimator_by_name(name): Lookup specific estimator
  - list_estimators(query=...): Text search in names/docstrings
  - get_available_tasks(): List all task types
  - get_available_tags(): List all capability tags
- Internal Methods:
  - _load_registry(): Calls sktime’s all_estimators() for each task
  - _create_node(): Extracts metadata from estimator class
  - _get_tags(): Calls cls.get_class_tags()
  - _get_hyperparameters(): Inspects __init__ signature

How It Works:

# First call triggers lazy loading
registry = get_registry()
registry._load_registry()  # Calls sktime.all_estimators("forecasting"), etc.

# Creates EstimatorNode for each estimator
for name, cls in estimators:
    node = EstimatorNode(
        name=name,
        task="forecasting",
        class_ref=cls,
        tags=cls.get_class_tags(),
        hyperparameters=inspect.signature(cls.__init__).parameters
    )

`tag_resolver.py` - Tag Resolution

Purpose: Handles tag-based filtering and compatibility checking

Key Functions:

Resolves tag queries (e.g., {"capability:pred_int": True})
Checks if estimator tags match requirements
Used by registry filtering and composition validation

📁 `src/sktime_mcp/composition/` - Pipeline Validation

`validator.py` - Composition Validator

Purpose: Validates that estimator compositions are valid before instantiation

Key Classes:

CompositionType (Enum)
- Types of compositions: PIPELINE, TRANSFORMER_PIPELINE, FORECASTING_PIPELINE, MULTIPLEXER, ENSEMBLE, REDUCTION
CompositionRule (dataclass)
- Defines valid composition patterns
- Example: Transformers can precede forecasters
ValidationResult (dataclass)
- Fields: valid, errors, warnings, suggestions
- Method: to_dict() for JSON serialization
CompositionValidator (singleton)
- Key Methods:
  - validate_pipeline(components): Check if pipeline is valid
  - _check_pair_compatibility(first, second): Validate two estimators can be composed
  - _check_tag_compatibility(first, second): Check tag requirements
  - get_valid_compositions(estimator_name): What can precede/follow this estimator
  - suggest_pipeline(task, requirements): Suggest a valid pipeline

Validation Rules:

# Valid: Transformer → Forecaster
["Detrender", "ARIMA"]  ✅

# Invalid: Forecaster → Forecaster
["ARIMA", "NaiveForecaster"]  ❌

# Valid: Multiple Transformers → Forecaster
["ConditionalDeseasonalizer", "Detrender", "ARIMA"]  ✅

📁 `src/sktime_mcp/runtime/` - Execution Engine

`handles.py` - Handle Manager

Purpose: Manages references to instantiated estimator objects

Why Needed?:

LLMs can’t hold Python object references
Solution: Create string handles (e.g., "est_abc123") that map to objects

Key Classes:

HandleInfo (dataclass)
- Stores metadata about a handle
- Fields: handle_id, estimator_name, instance, params, created_at, fitted, metadata
HandleManager (singleton)
- Key Methods:
  - create_handle(estimator_name, instance, params): Create new handle → returns "est_xyz"
  - get_instance(handle_id): Retrieve actual Python object
  - get_info(handle_id): Get handle metadata
  - mark_fitted(handle_id): Mark estimator as fitted
  - is_fitted(handle_id): Check if fitted
  - release_handle(handle_id): Free memory
  - list_handles(): List all active handles
  - _cleanup_oldest(): Auto-cleanup when max_handles reached

Flow:

# Instantiation
instance = ARIMA(order=[1,1,1])
handle = manager.create_handle("ARIMA", instance, {"order": [1,1,1]})
# Returns: "est_a1b2c3d4e5f6"

# Later retrieval
instance = manager.get_instance("est_a1b2c3d4e5f6")
instance.fit(y)

`executor.py` - Execution Runtime

Purpose: Orchestrates estimator instantiation, data loading, fitting, and prediction

Key Class: Executor (singleton)

Key Methods:

instantiate(estimator_name, params)
- Looks up estimator in registry
- Instantiates with parameters
- Creates handle
- Returns: {"success": True, "handle": "est_xyz", ...}
load_dataset(name)
- Loads demo datasets (airline, sunspots, etc.)
- Uses sktime’s dataset loaders
- Returns: pandas Series/DataFrame
fit(handle_id, y, X, fh)
- Retrieves instance from handle
- Calls instance.fit(y, X=X, fh=fh)
- Marks handle as fitted
predict(handle_id, fh, X)
- Retrieves fitted instance
- Calls instance.predict(fh=fh, X=X)
- Returns predictions
fit_predict(handle_id, dataset, horizon)
- Convenience method: load → fit → predict
- Returns: {"success": True, "predictions": {...}, "horizon": 12}
instantiate_pipeline(components, params_list) ⭐ Most Complex
- Purpose: Create complete pipelines from component names
- Steps:
  1. Validate pipeline composition
  2. Instantiate each component
  3. Build steps argument: [("name1", instance1), ("name2", instance2)]
  4. Determine pipeline type (TransformedTargetForecaster, Pipeline, etc.)
  5. Instantiate pipeline with steps
  6. Create handle
- Why Complex: Handles the “steps problem” - LLMs can’t pass Python objects, so we build them server-side

Example Flow:

# LLM sends:
{"components": ["Detrender", "ARIMA"], "params_list": [{}, {"order": [1,1,1]}]}

# Executor does:
detrender = Detrender()
arima = ARIMA(order=[1,1,1])
steps = [("transformer", detrender), ("forecaster", arima)]
pipeline = TransformedTargetForecaster(steps=steps)
handle = handle_manager.create_handle("Pipeline", pipeline)

# Returns to LLM:
{"success": True, "handle": "est_xyz", "pipeline": "Detrender → ARIMA"}

📁 `src/sktime_mcp/tools/` - MCP Tool Implementations

Each file implements one or more MCP tools that LLMs can call.

`list_estimators.py`

Tools:

list_estimators_tool(task, tags, query, limit)
- Calls registry.get_all_estimators(task, tags)
- Returns: {"success": True, "estimators": [...], "total": 50}
get_available_tags()
- Returns all capability tags
- Example: ["capability:pred_int", "handles-missing-data", ...]

`describe_estimator.py`

Tool: describe_estimator_tool(estimator)

Looks up estimator in registry
Returns full EstimatorNode details
Includes: name, task, module, tags, hyperparameters, docstring

`instantiate.py`

Tools:

instantiate_estimator_tool(estimator, params)
- Calls executor.instantiate(estimator, params)
- Returns handle
instantiate_pipeline_tool(components, params_list) ⭐
- Calls executor.instantiate_pipeline(components, params_list)
- Solves the “steps problem”
- Returns single handle for entire pipeline
release_handle_tool(handle)
- Frees memory for a handle
list_handles_tool()
- Lists all active handles
load_model_tool(path)
- Loads a previously saved model via MLflow

`fit_predict.py`

Tools:

fit_predict_tool(estimator_handle, dataset, horizon)
- Calls executor.fit_predict(handle, dataset, horizon)
- Complete workflow in one call
fit_predict_async_tool(estimator_handle, dataset, horizon)
- Dispatches a background job for fit and predict.

`evaluate.py`

Tool: evaluate_estimator_tool(estimator_handle, dataset, cv_folds)

Runs cross-validation using an expanding window splitter
Returns comparison metrics like MAE and RMSE

`format_tools.py`

Tools:

format_time_series_tool(...)
- Auto-formats, infers frequency, drops duplicates, and fills missing values.
auto_format_on_load_tool(enabled)
- Toggles whether new data sources get auto-formatted on load.

`job_tools.py`

Tools: check_job_status_tool, list_jobs_tool, cancel_job_tool, delete_job_tool, cleanup_old_jobs_tool

Interfaces with JobManager to control background training jobs.

`save_model.py`

Tool: save_model_tool(estimator_handle, path, mlflow_params)

Persists fitted estimators using MLflow.

`list_available_data.py`

Tool: list_available_data_tool(is_demo)

Returns available demo datasets and/or active user data handles

📁 `examples/` - Usage Examples

`01_forecasting_workflow.py`

Purpose: Demonstrates all MCP capabilities end-to-end

Steps:

List datasets
Discover forecasting estimators
Filter by tags (probabilistic forecasters)
Describe an estimator
Validate pipeline compositions
Instantiate estimator
Fit and predict
List active handles
Show available tags

Run: python examples/01_forecasting_workflow.py

`02_llm_query_simulation.py`

Purpose: Simulates how an LLM would interact with the MCP

Scenario: User asks “Forecast airline passengers with a probabilistic model”

LLM Steps:

list_estimators(task="forecasting", tags={"capability:pred_int": True})
describe_estimator("ARIMA")
instantiate_estimator("ARIMA", {"order": [1,1,1]})
fit_predict(handle, "airline", 12)

`03_pipeline_instantiation.py`

Purpose: Demonstrates pipeline creation

Examples:

Simple 2-component pipeline
Complex 3-component pipeline (deseasonalize → detrend → forecast)
Pipeline with custom parameters
Invalid pipeline (shows validation errors)

`04_mcp_pipeline_demo.py`

Purpose: End-to-end pipeline workflow

Steps:

Validate pipeline
Instantiate pipeline → get handle
Fit and predict → get forecasts

Additional Examples

05_simple_deseasonalize_detrend_forecaster.py: Deseasonalize + detrend workflow
06_simple_naive_forecaster.py: Basic NaiveForecaster example
background_training_example.py: Demonstrates async background jobs
job_management_demo.py: Demonstrates checking and listing job status
pandas_example.py: Demonstrates loading from in-memory pandas objects
csv_example.py: Demonstrates loading from CSV/TSV files
sql_example.py: Demonstrates loading from SQL databases

📁 `docs/` - Documentation

`architecture.md`

Purpose: High-level block diagrams explaining the data flow and adapter registry.

`data-sources.md`

Purpose: Detailed guide on loading data from Pandas, SQL, and various file formats.

`user-guide.md`

Purpose: Information for end-users on how to use the MCP tools.

`dev-guide.md`

Purpose: Guidelines for contributors on extending the server or adding new adapters.

📁 `tests/` - Test Suite

`test_core.py`

Purpose: Unit tests for core functionality

Test Classes:

TestRegistryInterface
- Tests registry loading, filtering, lookup
TestHandleManager
- Tests handle creation, retrieval, fitting, release
TestCompositionValidator
- Tests pipeline validation logic
TestTools
- Tests MCP tool functions

Run: pytest tests/

How It All Works Together

Example: LLM Forecasting Workflow

User Prompt: “Forecast airline passengers using ARIMA”

Step 1: Discovery

LLM → list_estimators(task="forecasting")
     → server.call_tool("list_estimators", {"task": "forecasting"})
     → list_estimators_tool(task="forecasting")
     → registry.get_all_estimators(task="forecasting")
     → Returns: [{"name": "ARIMA", ...}, {"name": "NaiveForecaster", ...}, ...]

Step 2: Description

LLM → describe_estimator("ARIMA")
     → describe_estimator_tool("ARIMA")
     → registry.get_estimator_by_name("ARIMA")
     → Returns: {"name": "ARIMA", "hyperparameters": {"order": ...}, ...}

Step 3: Instantiation

LLM → instantiate_estimator("ARIMA", {"order": [1,1,1]})
     → instantiate_estimator_tool("ARIMA", {"order": [1,1,1]})
     → executor.instantiate("ARIMA", {"order": [1,1,1]})
     → ARIMA_class = registry.get_estimator_by_name("ARIMA").class_ref
     → instance = ARIMA_class(order=[1,1,1])
     → handle = handle_manager.create_handle("ARIMA", instance)
     → Returns: {"success": True, "handle": "est_abc123"}

Step 4: Execution

LLM → fit_predict("est_abc123", "airline", 12)
     → fit_predict_tool("est_abc123", "airline", 12)
     → executor.fit_predict("est_abc123", "airline", 12)
     → y = executor.load_dataset("airline")
     → instance = handle_manager.get_instance("est_abc123")
     → instance.fit(y)
     → predictions = instance.predict(fh=[1,2,...,12])
     → Returns: {"success": True, "predictions": {...}, "horizon": 12}

Data Flow Diagram

┌─────────┐
│   LLM   │
└────┬────┘
     │ JSON-RPC request
     ▼
┌─────────────────┐
│  MCP Server     │ ← server.py
│  (stdio)        │
└────┬────────────┘
     │ Route to tool
     ▼
┌─────────────────┐
│  Tool Function  │ ← tools/*.py
└────┬────────────┘
     │ Call business logic
     ▼
┌──────────────────────────────────┐
│  Registry / Executor / Validator │ ← registry/, runtime/, composition/
└────┬─────────────────────────────┘
     │ Interact with sktime
     ▼
┌─────────────────┐
│  sktime Library │
└─────────────────┘

Key Concepts

1. Registry-First Design

Don’t parse code or docs
Use sktime’s all_estimators() as source of truth
Extract metadata from classes directly

2. Handle-Based References

LLMs can’t hold Python objects
Solution: String handles ("est_abc123") map to objects
Handle manager maintains the mapping

3. Lazy Loading

Registry loads on first access
Singleton pattern ensures one instance
Caches all estimators for fast lookups

4. Tag-Based Discovery

Estimators have capability tags
LLMs can filter by requirements
Example: {"capability:pred_int": True} finds probabilistic forecasters

5. Composition Validation

Check pipeline validity before instantiation
Prevents runtime errors
Provides helpful error messages

6. The Steps Problem

Problem: Pipelines need steps=[("name", instance), ...]
Solution: LLM sends component names, server builds instances
Benefit: LLM uses simple JSON, server handles complexity

7. JSON Sanitization

Convert numpy/pandas to JSON-serializable types
Handle special values (NaN, Infinity)
Ensure all responses are valid JSON

8. Singleton Pattern

Registry, Executor, HandleManager, Validator are singletons
Ensures shared state across tool calls
Efficient memory usage

Summary

sktime-mcp is a well-architected MCP server that:

Exposes sktime’s 200+ estimators to LLMs
Validates compositions before execution
Manages object lifecycles via handles
Executes real ML workflows on real data
Translates between JSON (LLM) and Python (sktime)

Key Innovation: The instantiate_pipeline tool solves the “steps problem”, enabling LLMs to create complex pipelines with a single JSON-RPC call.

Architecture Highlights:

Clean separation of concerns (registry, composition, runtime, tools)
Singleton pattern for shared state
Handle-based object management
Comprehensive validation before execution
JSON-first API design

This enables LLMs to perform sophisticated time series forecasting workflows without writing any Python code! 🚀

Complete Code Explanation: sktime-mcp

📋 Table of Contents

Project Overview

What Problem Does It Solve?

Architecture

File-by-File Breakdown

📁 Root Level Files

README.md

pyproject.toml

📁 src/sktime_mcp/ - Core Source Code

server.py - MCP Server Entry Point

📁 src/sktime_mcp/registry/ - Estimator Discovery

interface.py - Registry Interface

tag_resolver.py - Tag Resolution

📁 src/sktime_mcp/composition/ - Pipeline Validation

validator.py - Composition Validator

📁 src/sktime_mcp/runtime/ - Execution Engine

handles.py - Handle Manager

executor.py - Execution Runtime

📁 src/sktime_mcp/tools/ - MCP Tool Implementations

list_estimators.py

describe_estimator.py

instantiate.py

fit_predict.py

evaluate.py

format_tools.py

job_tools.py

save_model.py

list_available_data.py

📁 examples/ - Usage Examples

01_forecasting_workflow.py

02_llm_query_simulation.py

03_pipeline_instantiation.py

04_mcp_pipeline_demo.py

Additional Examples

📁 docs/ - Documentation

architecture.md

data-sources.md

user-guide.md

dev-guide.md

📁 tests/ - Test Suite

test_core.py

How It All Works Together

Example: LLM Forecasting Workflow

Data Flow Diagram

Key Concepts

1. Registry-First Design

2. Handle-Based References

3. Lazy Loading

4. Tag-Based Discovery

5. Composition Validation

6. The Steps Problem

7. JSON Sanitization

8. Singleton Pattern

Summary

`README.md`

`pyproject.toml`

📁 `src/sktime_mcp/` - Core Source Code

`server.py` - MCP Server Entry Point

📁 `src/sktime_mcp/registry/` - Estimator Discovery

`interface.py` - Registry Interface

`tag_resolver.py` - Tag Resolution

📁 `src/sktime_mcp/composition/` - Pipeline Validation

`validator.py` - Composition Validator

📁 `src/sktime_mcp/runtime/` - Execution Engine

`handles.py` - Handle Manager

`executor.py` - Execution Runtime

📁 `src/sktime_mcp/tools/` - MCP Tool Implementations

`list_estimators.py`

`describe_estimator.py`

`instantiate.py`

`fit_predict.py`

`evaluate.py`

`format_tools.py`

`job_tools.py`

`save_model.py`

`list_available_data.py`

📁 `examples/` - Usage Examples

`01_forecasting_workflow.py`

`02_llm_query_simulation.py`

`03_pipeline_instantiation.py`

`04_mcp_pipeline_demo.py`

📁 `docs/` - Documentation

`architecture.md`

`data-sources.md`

`user-guide.md`

`dev-guide.md`

📁 `tests/` - Test Suite

`test_core.py`