Cross-Validation Selection Guide

Choose the right CV method for your medical machine learning project. Follow our interactive decision tree or consult the detailed method matrix.

account_tree Interactive Decision Tree

Start with the data type question and follow the branches to find the best CV method for your use case.

swap_horiz
What type of data structure do you have?
person Independent samples
(each row = different patient)
Small dataset (<100)?
LOOCV / Bootstrap .632
Imbalanced classes?
Stratified k-Fold
Need hyperparameter tuning?
Nested CV
Standard case?
5-Fold or 10-Fold CV
schedule Time series data
(temporal order matters)
Fixed training size?
Rolling Window CV
Use all historical data?
Expanding Window CV
Seasonal/periodic patterns?
Blocked Time Series CV
Standard forecasting?
Time Series Split
groups Grouped data
(multiple records per patient)
Generalize to new patients?
Leave-One-Group-Out
Need class balance?
Stratified Group k-Fold
Standard grouped case?
Group k-Fold
map Spatial / Geographic data
Prevent spatial leakage?
Buffered Spatial CV
Standard spatial case?
Spatial Block CV

table_chart Detailed Method Selection Matrix

A comprehensive reference of all cross-validation methods, organized by data type and use case.

Data Characteristic Sample Size Primary Goal Recommended Method Why?
I.I.D. Data
Independent samples >1000 Quick evaluation Hold-Out (70/30) Fast, simple
Independent samples 100 - 1000 Robust evaluation 5-Fold CV Good bias-variance tradeoff
Independent samples <100 Max data usage LOOCV Uses all data for training
Imbalanced classes Any Maintain class ratio Stratified k-Fold Preserves class distribution
Independent samples Any Confidence intervals Bootstrap .632 Provides CI estimates
Independent samples Any Model selection Nested CV Unbiased hyperparameter tuning
Temporal Data
Time series Any Forecasting Time Series Split Respects temporal order
Time series Long series Fixed window Rolling Window Constant training size
Time series Growing data Use all history Expanding Window Increasing training data
Time series With patterns Preserve patterns Blocked Time Series Maintains temporal blocks
Financial / Trading Any Prevent leakage Purged K-Fold Adds temporal gaps
Grouped Data
Patient records Many groups Standard validation Group k-Fold No patient in multiple folds
Patient records Few groups Test generalization Leave-One-Group-Out Each group as test
Hierarchical Multi-level Respect hierarchy Hierarchical Group CV Maintains structure
Imbalanced groups Any Balance + grouping Stratified Group k-Fold Preserves both constraints
Spatial Data
Geographic Grid-based Prevent leakage Buffered Spatial CV Adds buffer zones
Geographic Continuous Standard spatial Spatial Block CV Creates spatial blocks
Spatiotemporal Both dimensions Complex patterns Spatiotemporal Block Handles both aspects
Environmental health Geographic + health Environmental factors Environmental Health CV Epidemiology studies
Additional Methods
Independent samples <50 Exhaustive validation LPOCV (Leave-p-Out) Tests all p-combinations
Independent samples Any Multiple random splits Monte Carlo CV Flexible, confidence intervals
Time series Complex patterns Multiple test periods Combinatorial Purged CV Financial / trading data
Time + Groups Both constraints Combined validation Purged Group Time Series Complex medical studies
Time series Nested optimization Temporal + hyperparam Nested Temporal CV Advanced forecasting
Grouped data Test on p groups Multiple group testing Leave-p-Groups-Out Multi-site validation
Grouped data Multiple runs Robust group validation Repeated Group k-Fold Stable estimates
Hierarchical Multiple levels Respect all levels Multi-level CV Hospital > Dept > Patient
Grouped data Hyperparam tuning Nested + grouped Nested Grouped CV Unbiased selection

local_hospital Common Medical Scenarios

Real-world medical ML scenarios with recommended CV methods and ready-to-use code examples.

1Clinical Trial with Multiple Sites

Data: Patients nested within hospitals
Goal: Test generalization to new sites
from trustcv.splitters.grouped import GroupKFoldMedical
cv = GroupKFoldMedical(n_splits=5)
for train, test in cv.split(X, y, groups=site_ids):
    # No site appears in both train and test

2ICU Patient Monitoring

Data: Hourly vital signs over time
Goal: Predict future deterioration
from trustcv.splitters.temporal import TimeSeriesSplit
cv = TimeSeriesSplit(n_splits=5)
for train, test in cv.split(X):
    # Always train on past, test on future

3Disease Diagnosis from Images

Data: One image per patient
Goal: Robust performance estimate
from trustcv.splitters.iid import StratifiedKFoldMedical
cv = StratifiedKFoldMedical(n_splits=5)
for train, test in cv.split(X, y):
    # Maintains class balance across folds

4Longitudinal Patient Study

Data: Multiple visits per patient over years
Goal: Validate on new patients
from trustcv.splitters.temporal import PurgedGroupTimeSeriesSplit
cv = PurgedGroupTimeSeriesSplit(n_splits=5, purge_gap=30)  # 30-day gap
for train, test in cv.split(X, y, groups=patient_ids, times=visit_dates):
    # Respects both patient grouping and temporal order

5Geographic Disease Spread

Data: Cases with GPS coordinates
Goal: Predict spread to new regions
from trustcv.splitters.spatial import BufferedSpatialCV
cv = BufferedSpatialCV(n_splits=5, buffer_size=10)  # 10km buffer
for train, test in cv.split(X, coordinates=gps_coords):
    # Buffer prevents spatial leakage

warning Critical Warnings

Common pitfalls and best practices to ensure valid cross-validation results.

cancel Never Do This
  • close Random splits on time series data — causes future data leakage into training
  • close Random splits on grouped patient data — same patient ends up in both train and test
  • close Using test set for ANY decisions — leads to overfitting to the test set
  • close Single hold-out for small datasets — results have high variance
  • close Ignoring class imbalance — performance biased toward majority class
check_circle Always Do This
  • check Check for data leakage after splitting
  • check Preserve data structure (temporal, grouped, spatial)
  • check Use stratification for imbalanced datasets
  • check Apply nested CV for hyperparameter tuning
  • check Report confidence intervals, not just mean scores

speed Performance vs Computational Cost

Compare trade-offs between variance, bias, and computational requirements.

Method Computational Cost Variance Bias Use When
Hold-Out
Fastest
High Low Large datasets, quick tests
5-Fold CV
Medium Low Standard choice
10-Fold CV
Low Low Need lower variance
LOOCV
Slowest
Lowest Low Small datasets
Bootstrap
Low Medium Need confidence intervals
Nested CV
Low Lowest Hyperparameter tuning
Group k-Fold
Medium Low Grouped data
Time Series Split
Medium Low Temporal data

lightbulb Rules of Thumb

Quick reference guidelines for choosing folds, methods, and handling special cases.

data_usage Sample Size Rules

  • n < 100 Use LOOCV or Bootstrap
  • 100 < n < 1000 Use 10-Fold CV
  • n > 1000 Use 5-Fold CV or Hold-Out

tune Fold Number Selection

  • More folds = Less bias, More variance, More computation
  • Fewer folds = More bias, Less variance, Less computation
  • Sweet spot 5 - 10 folds for most cases

emergency Special Cases

  • Rare disease (prevalence < 10%): Always use stratification
  • Multi-site data: Always use grouped CV
  • Time series: Never use random splits
  • Small test set: Consider repeated CV for stability