📚 Cross-Validation for Regulatory Compliance

Complete guide to meeting FDA, CE MDR, and TRIPOD+AI requirements for AI/ML medical device validation

Last updated: January 2025 | Based on latest regulatory guidance documents

verified FDA AI/ML Guidance 2024
verified CE MDR 2017/745
verified MDCG 2021-24
verified TRIPOD+AI 2024

📑 Table of Contents

dashboard

Overview & Key Principles

Fundamental requirements across all regulatory frameworks

🎯 Universal Principles for Medical AI Validation

  • Patient-Level Data Integrity: All samples from one patient must stay together
  • Independent Test Set: Final validation on completely held-out data
  • Pre-specification: Methods defined before seeing data
  • Transparency: Complete documentation of all decisions
  • Reproducibility: Random seeds, version control, environment specs

✅ Three-Level Validation Architecture

Level 1: Development (60-70% of data)
  └── Training Set: Model fitting
  
Level 2: Validation (15-20% of data)  
  └── Cross-Validation: Hyperparameter tuning & method selection
  
Level 3: Testing (15-20% of data)
  └── Held-Out Test Set: Final unbiased performance estimate
                
FDA

FDA Requirements (USA)

Food and Drug Administration guidance for AI/ML-based Software as Medical Device (SaMD)

📋 FDA Good Machine Learning Practice (GMLP) - 10 Principles

  1. Multi-disciplinary expertise throughout lifecycle
  2. Good software engineering and security practices
  3. Clinical study participants and data sets are representative of intended patient population
  4. Training/tuning/test sets are independent and selected appropriately
  5. Reference datasets are based on best available methods
  6. Model design is tailored to available data and reflects intended use
  7. Focus on performance of human-AI team
  8. Testing demonstrates device performance during clinically relevant conditions
  9. Users provided clear information on benefits, risks, and limitations
  10. Deployed models monitored for performance and re-training managed

FDA Cross-Validation Requirements

Development Phase Requirements

  • Data Partitioning:
    • Training: 60-70%
    • Validation/Tuning: 15-20%
    • Test (locked): 15-20%
  • Patient Grouping: Mandatory for multi-sample patient data
  • Site Diversity: If multi-site, must validate across sites
  • Temporal Validation: For time-dependent conditions
  • Demographic Analysis: Performance across age, sex, race/ethnicity
# FDA-Compliant Data Split Example
from sklearn.model_selection import train_test_split

# First: Separate test set (LOCK THIS AWAY)
X_dev, X_test, y_dev, y_test, patients_dev, patients_test = train_test_split(
    X, y, patient_ids,
    test_size=0.20,  # 20% for final test
    stratify=y,       # Maintain class balance
    random_state=42   # Pre-specified seed
)

# Second: Split development into train/validation
X_train, X_val, y_train, y_val = train_test_split(
    X_dev, y_dev,
    test_size=0.20,   # 20% of 80% = 16% total for validation
    stratify=y_dev,
    random_state=42
)

# Use validation for hyperparameter tuning
# NEVER touch X_test until final evaluation
                    

510(k) Submission Requirements

Section 13: Software Validation & Verification

  • Complete description of validation methodology
  • Justification for data split ratios
  • Patient demographics table for each partition
  • Performance metrics with 95% confidence intervals
  • Subgroup analysis results
  • Failure mode analysis
  • Comparison to predicate device (if applicable)

⚠️ Common FDA Rejection Reasons

  • Test set contamination (used for ANY decision)
  • No pre-specified analysis plan
  • Missing confidence intervals
  • Inadequate sample size justification
  • No subgroup analysis for protected classes
  • Patient data in multiple partitions

De Novo Classification Requirements

  • External Validation: Often required on completely independent dataset
  • Prospective Study: May need real-world clinical validation
  • Special Controls: Must establish validation standards for device type
  • Labeling Requirements: Clear limitations and warnings

Predetermined Change Control Plan (PCCP)

For AI/ML devices that will be updated post-market:

  • Retraining Protocol: How will model be retrained?
  • Performance Monitoring: Metrics and thresholds
  • Update Validation: Cross-validation strategy for updates
  • Data Drift Detection: Methods to identify distribution shifts
  • Version Control: Track which validation for which version
CE

CE MDR & MDCG Guidelines (Europe)

Medical Device Regulation 2017/745 and Medical Device Coordination Group guidance

📋 MDR Annex I - Essential Requirements for Software Validation

Chapter II, Section 17.2: Software validation requirements

  • Software shall be validated according to state of the art
  • Validation shall demonstrate conformity with intended purpose
  • Lifecycle approach including verification and validation
  • Clinical evidence required for Class IIa and higher

MDCG 2021-24: AI/ML Specific Requirements

Classification Rules for AI/ML Software

Class Description Validation Requirements
Class I Low risk, no direct patient impact Basic validation
Class IIa Inform clinical decisions Clinical evaluation required
Class IIb Direct diagnosis/monitoring vital functions Extensive clinical validation
Class III Life-supporting/life-saving Highest level validation + clinical trials

MDCG 2020-1: Clinical Evaluation Requirements

Clinical Evaluation Process for AI/ML

  1. Stage 0 - Scope: Define clinical claims and intended use
  2. Stage 1 - Data: Identify pertinent data
    • Clinical data from device under evaluation
    • Clinical data from equivalent device
    • Relevant scientific literature
  3. Stage 2 - Appraisal: Evaluate data quality and relevance
  4. Stage 3 - Analysis: Demonstrate:
    • Performance (analytical validation)
    • Clinical performance (clinical validation)
    • Clinical utility
  5. Stage 4 - Report: Clinical Evaluation Report (CER)
# CE MDR Compliant Validation Structure
class CEMDRValidator:
    def __init__(self, device_class='IIa'):
        self.device_class = device_class
        self.validation_levels = {
            'I': ['technical_validation'],
            'IIa': ['technical_validation', 'clinical_validation'],
            'IIb': ['technical_validation', 'clinical_validation', 'clinical_performance'],
            'III': ['technical_validation', 'clinical_validation', 'clinical_performance', 'clinical_trial']
        }
    
    def validate(self, X, y, patient_ids):
        validations = self.validation_levels[self.device_class]
        
        if 'technical_validation' in validations:
            # Analytical performance
            self.technical_metrics = self.run_technical_validation(X, y)
        
        if 'clinical_validation' in validations:
            # Clinical concordance
            self.clinical_metrics = self.run_clinical_validation(X, y, patient_ids)
        
        if 'clinical_performance' in validations:
            # Real-world performance
            self.performance_metrics = self.run_performance_study(X, y, patient_ids)
        
        return self.generate_cer()  # Clinical Evaluation Report
                

Key Differences from FDA

Aspect CE MDR FDA
Classification System I, IIa, IIb, III (risk-based) Class I, II, III + De Novo
Clinical Evidence Clinical Evaluation Report (CER) Clinical validation in 510(k)
Post-Market PMCF (Post-Market Clinical Follow-up) Post-market surveillance + PCCP
AI/ML Updates Significant change requires new assessment PCCP allows predetermined changes
Notified Body Required for Class IIa and above FDA direct review
science

TRIPOD+AI Scientific Standards

Transparent Reporting of Multivariable Prediction Models for Individual Prognosis or Diagnosis + AI

📋 TRIPOD+AI Checklist Items for Cross-Validation

Key items specifically addressing validation methodology:

  • Item 10b - Data Partitioning: Describe how data was split (random, temporal, geographic, etc.)
  • Item 10c - Cross-validation: Report exact CV method, number of folds, stratification
  • Item 13b - Patient Clustering: How multiple measurements per patient were handled
  • Item 14b - Missing Data: How missing data was handled in each fold
  • Item 15b - Model Specification: Report if model selection used same data as evaluation
  • Item 16 - Performance Metrics: Report with confidence intervals for each fold
  • Item 18b - Model Updating: Describe any calibration or updating procedures

TRIPOD+AI Validation Requirements

✅ Best Practices from TRIPOD+AI

  1. Transparent Reporting:
    • Report performance for EACH fold, not just mean
    • Include distribution plots of fold performances
    • Document any folds that failed or had issues
  2. Data Dependencies:
    • Clearly state clustering (patient, site, time)
    • Justify independence assumptions
    • Report effective sample size after clustering
  3. Reproducibility:
    • Provide all random seeds
    • Document software versions
    • Share code when possible
  4. Uncertainty Quantification:
    • Bootstrap confidence intervals
    • Calibration plots for each fold
    • Decision curve analysis
# TRIPOD+AI Compliant Reporting Example
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt

def tripod_compliant_cv(X, y, patient_ids, model):
    """
    TRIPOD+AI compliant cross-validation with full reporting
    """
    # Item 10c: Specify exact CV method
    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    
    # Store results for each fold (Item 16)
    fold_results = []
    
    for fold_idx, (train_idx, val_idx) in enumerate(cv.split(X, y)):
        # Item 13b: Ensure patient-level splitting
        train_patients = patient_ids[train_idx]
        val_patients = patient_ids[val_idx]
        assert len(np.intersect1d(train_patients, val_patients)) == 0
        
        # Train model
        model.fit(X[train_idx], y[train_idx])
        
        # Predictions
        y_pred_proba = model.predict_proba(X[val_idx])[:, 1]
        
        # Calculate metrics for this fold
        fold_auc = roc_auc_score(y[val_idx], y_pred_proba)
        
        # Store detailed results
        fold_results.append({
            'fold': fold_idx + 1,
            'n_train': len(train_idx),
            'n_val': len(val_idx),
            'n_patients_train': len(np.unique(train_patients)),
            'n_patients_val': len(np.unique(val_patients)),
            'auc': fold_auc,
            'y_true': y[val_idx],
            'y_pred': y_pred_proba
        })
        
        print(f"Fold {fold_idx + 1}: AUC = {fold_auc:.3f}")
    
    # Item 16: Report aggregate metrics with CI
    aucs = [r['auc'] for r in fold_results]
    mean_auc = np.mean(aucs)
    std_auc = np.std(aucs)
    ci_lower = np.percentile(aucs, 2.5)
    ci_upper = np.percentile(aucs, 97.5)
    
    print(f"\nOverall Performance:")
    print(f"Mean AUC: {mean_auc:.3f} (SD: {std_auc:.3f})")
    print(f"95% CI: [{ci_lower:.3f}, {ci_upper:.3f}]")
    print(f"Individual fold AUCs: {aucs}")
    
    # Create visualization as required by TRIPOD+AI
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    # Plot ROC curve for each fold
    for idx, result in enumerate(fold_results):
        ax = axes[idx // 3, idx % 3]
        fpr, tpr, _ = roc_curve(result['y_true'], result['y_pred'])
        ax.plot(fpr, tpr, label=f"Fold {idx + 1} (AUC = {result['auc']:.3f})")
        ax.plot([0, 1], [0, 1], 'k--')
        ax.set_xlabel('False Positive Rate')
        ax.set_ylabel('True Positive Rate')
        ax.legend()
        ax.set_title(f'Fold {idx + 1}')
    
    # Summary plot in last subplot
    ax = axes[1, 2]
    ax.boxplot(aucs)
    ax.set_ylabel('AUC')
    ax.set_title('Distribution of Fold Performances')
    
    plt.tight_layout()
    plt.savefig('tripod_ai_validation_report.png', dpi=300)
    
    return fold_results
                
compare

Comparative Requirements Table

Side-by-side comparison of validation requirements

Requirement FDA CE MDR/MDCG TRIPOD+AI
Test Set ✅ Mandatory locked test set (15-20%) ✅ Required for Class IIa+ ✅ Strongly recommended
Patient Grouping ✅ Mandatory ✅ Required when applicable ✅ Must report clustering
Cross-Validation Method Pre-specified, justified State of the art Fully documented with code
Confidence Intervals ✅ Required (95% CI) ✅ Required ✅ Required with method specified
Subgroup Analysis ✅ Mandatory (age, sex, race) Risk-based requirement ✅ Recommended
External Validation Often for De Novo Depends on class ✅ Strongly recommended
Performance per Fold Summary statistics ok Summary statistics ok ✅ Must report each fold
Code Sharing Not required Not required ✅ Strongly encouraged
Update Strategy PCCP required New assessment Document update methods
Clinical Evidence Clinical validation study Clinical Evaluation Report Clinical utility analysis
code

Implementation Examples

Code templates for regulatory-compliant validation

Complete Regulatory-Compliant Validation Pipeline

"""
Regulatory-Compliant Cross-Validation Pipeline
Meets FDA, CE MDR, and TRIPOD+AI requirements
"""

import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import roc_auc_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler
import json
from datetime import datetime
import hashlib

class RegulatoryCompliantValidator:
    """
    Implements validation meeting FDA, CE MDR, and TRIPOD+AI standards
    """
    
    def __init__(self, regulatory_standard='all', device_class='IIa'):
        self.regulatory_standard = regulatory_standard
        self.device_class = device_class
        self.validation_plan = self._create_validation_plan()
        self.audit_trail = []
        
    def _create_validation_plan(self):
        """Pre-specify validation plan (FDA requirement)"""
        plan = {
            'created_at': datetime.now().isoformat(),
            'test_set_ratio': 0.20,
            'validation_ratio': 0.20,
            'cv_method': 'stratified_patient_kfold',
            'n_folds': 5,
            'random_seed': 42,
            'stratification': 'outcome',
            'patient_grouping': True,
            'confidence_level': 0.95,
            'subgroups': ['age_group', 'sex', 'race', 'site'],
            'performance_metrics': [
                'sensitivity', 'specificity', 'ppv', 'npv',
                'accuracy', 'auc_roc', 'auc_pr', 'calibration'
            ]
        }
        
        # Lock the plan with hash (FDA: pre-specification)
        plan['hash'] = hashlib.sha256(
            json.dumps(plan, sort_keys=True).encode()
        ).hexdigest()
        
        return plan
    
    def split_data(self, X, y, patient_ids, demographics=None):
        """
        Three-way split meeting all regulatory requirements
        """
        # Log action for audit trail
        self._log("Data splitting initiated")
        
        # FDA: Ensure patient-level splitting
        unique_patients = np.unique(patient_ids)
        patient_labels = pd.DataFrame({
            'patient_id': unique_patients,
            'label': [y[patient_ids == pid][0] for pid in unique_patients]
        })
        
        # First split: Separate test set (FDA: locked test set)
        patients_dev, patients_test = train_test_split(
            patient_labels['patient_id'],
            test_size=self.validation_plan['test_set_ratio'],
            stratify=patient_labels['label'],
            random_state=self.validation_plan['random_seed']
        )
        
        # Create masks
        dev_mask = np.isin(patient_ids, patients_dev)
        test_mask = np.isin(patient_ids, patients_test)
        
        # Split data
        X_dev, X_test = X[dev_mask], X[test_mask]
        y_dev, y_test = y[dev_mask], y[test_mask]
        patient_ids_dev = patient_ids[dev_mask]
        patient_ids_test = patient_ids[test_mask]
        
        # Lock test set (FDA requirement)
        self.locked_test_set = {
            'X': X_test.copy(),
            'y': y_test.copy(),
            'patient_ids': patient_ids_test.copy(),
            'locked_at': datetime.now().isoformat(),
            'hash': hashlib.sha256(X_test.tobytes() + y_test.tobytes()).hexdigest()
        }
        
        self._log(f"Test set locked: {len(patients_test)} patients, {len(X_test)} samples")
        
        # TRIPOD+AI: Report data characteristics
        self.data_characteristics = {
            'total_patients': len(unique_patients),
            'total_samples': len(X),
            'samples_per_patient': {
                'mean': len(X) / len(unique_patients),
                'min': min([sum(patient_ids == pid) for pid in unique_patients]),
                'max': max([sum(patient_ids == pid) for pid in unique_patients])
            },
            'class_distribution': {
                'overall': pd.Series(y).value_counts().to_dict(),
                'train': pd.Series(y_dev).value_counts().to_dict(),
                'test': pd.Series(y_test).value_counts().to_dict()
            }
        }
        
        return X_dev, y_dev, patient_ids_dev
    
    def run_cross_validation(self, X_dev, y_dev, patient_ids_dev, model):
        """
        Run cross-validation meeting all standards
        """
        # Create patient-aware CV splitter
        unique_patients = np.unique(patient_ids_dev)
        patient_labels = np.array([
            y_dev[patient_ids_dev == pid][0] for pid in unique_patients
        ])
        
        cv = StratifiedKFold(
            n_splits=self.validation_plan['n_folds'],
            shuffle=True,
            random_state=self.validation_plan['random_seed']
        )
        
        fold_results = []
        
        for fold_idx, (train_patient_idx, val_patient_idx) in enumerate(
            cv.split(unique_patients, patient_labels)
        ):
            # Get patient IDs for this fold
            train_patients = unique_patients[train_patient_idx]
            val_patients = unique_patients[val_patient_idx]
            
            # FDA: Verify no patient leakage
            assert len(np.intersect1d(train_patients, val_patients)) == 0
            
            # Get sample indices
            train_mask = np.isin(patient_ids_dev, train_patients)
            val_mask = np.isin(patient_ids_dev, val_patients)
            
            X_train_fold = X_dev[train_mask]
            y_train_fold = y_dev[train_mask]
            X_val_fold = X_dev[val_mask]
            y_val_fold = y_dev[val_mask]
            
            # CE MDR: Preprocessing within fold
            scaler = StandardScaler()
            X_train_scaled = scaler.fit_transform(X_train_fold)
            X_val_scaled = scaler.transform(X_val_fold)
            
            # Train model
            model_fold = clone(model)
            model_fold.fit(X_train_scaled, y_train_fold)
            
            # Predictions
            y_pred_proba = model_fold.predict_proba(X_val_scaled)[:, 1]
            y_pred = (y_pred_proba >= 0.5).astype(int)
            
            # Calculate comprehensive metrics
            tn, fp, fn, tp = confusion_matrix(y_val_fold, y_pred).ravel()
            
            fold_metrics = {
                'fold': fold_idx + 1,
                'n_train_patients': len(train_patients),
                'n_val_patients': len(val_patients),
                'n_train_samples': sum(train_mask),
                'n_val_samples': sum(val_mask),
                'sensitivity': tp / (tp + fn) if (tp + fn) > 0 else 0,
                'specificity': tn / (tn + fp) if (tn + fp) > 0 else 0,
                'ppv': tp / (tp + fp) if (tp + fp) > 0 else 0,
                'npv': tn / (tn + fn) if (tn + fn) > 0 else 0,
                'accuracy': (tp + tn) / (tp + tn + fp + fn),
                'auc_roc': roc_auc_score(y_val_fold, y_pred_proba),
                'confusion_matrix': {'tn': int(tn), 'fp': int(fp), 
                                   'fn': int(fn), 'tp': int(tp)}
            }
            
            fold_results.append(fold_metrics)
            
            # TRIPOD+AI: Report each fold
            print(f"Fold {fold_idx + 1}: AUC={fold_metrics['auc_roc']:.3f}, "
                  f"Sens={fold_metrics['sensitivity']:.3f}, "
                  f"Spec={fold_metrics['specificity']:.3f}")
        
        # Calculate aggregate metrics with confidence intervals
        self.cv_results = self._calculate_aggregate_metrics(fold_results)
        
        return fold_results
    
    def _calculate_aggregate_metrics(self, fold_results):
        """Calculate mean, std, and CI for all metrics"""
        metrics = {}
        
        for metric_name in ['sensitivity', 'specificity', 'auc_roc', 'accuracy']:
            values = [f[metric_name] for f in fold_results]
            
            # Bootstrap for confidence intervals (TRIPOD+AI requirement)
            bootstrap_means = []
            for _ in range(1000):
                bootstrap_sample = np.random.choice(values, size=len(values), replace=True)
                bootstrap_means.append(np.mean(bootstrap_sample))
            
            metrics[metric_name] = {
                'mean': np.mean(values),
                'std': np.std(values),
                'ci_lower': np.percentile(bootstrap_means, 2.5),
                'ci_upper': np.percentile(bootstrap_means, 97.5),
                'fold_values': values  # TRIPOD+AI: Keep individual fold values
            }
        
        return metrics
    
    def evaluate_subgroups(self, X_dev, y_dev, patient_ids_dev, demographics, model):
        """
        FDA requirement: Subgroup analysis
        """
        subgroup_results = {}
        
        for subgroup in self.validation_plan['subgroups']:
            if subgroup not in demographics.columns:
                continue
                
            unique_values = demographics[subgroup].unique()
            subgroup_results[subgroup] = {}
            
            for value in unique_values:
                # Get samples for this subgroup
                subgroup_patients = demographics[
                    demographics[subgroup] == value
                ]['patient_id'].values
                
                mask = np.isin(patient_ids_dev, subgroup_patients)
                
                if sum(mask) < 20:  # Skip if too few samples
                    continue
                
                X_subgroup = X_dev[mask]
                y_subgroup = y_dev[mask]
                
                # Simple holdout validation for subgroup
                if len(np.unique(y_subgroup)) > 1:
                    # Calculate performance
                    y_pred_proba = model.predict_proba(X_subgroup)[:, 1]
                    auc = roc_auc_score(y_subgroup, y_pred_proba)
                    
                    subgroup_results[subgroup][value] = {
                        'n_samples': sum(mask),
                        'auc': auc,
                        'prevalence': np.mean(y_subgroup)
                    }
        
        return subgroup_results
    
    def final_test_evaluation(self, model):
        """
        FDA: Final evaluation on locked test set
        """
        # Verify test set hasn't been modified
        current_hash = hashlib.sha256(
            self.locked_test_set['X'].tobytes() + 
            self.locked_test_set['y'].tobytes()
        ).hexdigest()
        
        assert current_hash == self.locked_test_set['hash'], "Test set has been modified!"
        
        # Evaluate on test set
        X_test = self.locked_test_set['X']
        y_test = self.locked_test_set['y']
        
        y_pred_proba = model.predict_proba(X_test)[:, 1]
        y_pred = (y_pred_proba >= 0.5).astype(int)
        
        # Comprehensive metrics
        tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
        
        test_results = {
            'sensitivity': tp / (tp + fn),
            'specificity': tn / (tn + fp),
            'ppv': tp / (tp + fp),
            'npv': tn / (tn + fn),
            'accuracy': (tp + tn) / (tp + tn + fp + fn),
            'auc_roc': roc_auc_score(y_test, y_pred_proba),
            'confusion_matrix': {'tn': int(tn), 'fp': int(fp), 
                               'fn': int(fn), 'tp': int(tp)},
            'n_samples': len(y_test),
            'evaluated_at': datetime.now().isoformat()
        }
        
        return test_results
    
    def generate_regulatory_report(self):
        """
        Generate report meeting all regulatory requirements
        """
        report = {
            'metadata': {
                'generated_at': datetime.now().isoformat(),
                'validation_plan_hash': self.validation_plan['hash'],
                'regulatory_standards': ['FDA', 'CE MDR', 'TRIPOD+AI']
            },
            'data_description': self.data_characteristics,
            'validation_methodology': self.validation_plan,
            'cross_validation_results': self.cv_results,
            'test_set_results': self.test_results,
            'subgroup_analysis': self.subgroup_results,
            'audit_trail': self.audit_trail
        }
        
        # Save as JSON
        with open('regulatory_validation_report.json', 'w') as f:
            json.dump(report, f, indent=2)
        
        # Generate formatted report (HTML/PDF)
        self._generate_formatted_report(report)
        
        return report
    
    def _log(self, message):
        """Audit trail for regulatory compliance"""
        self.audit_trail.append({
            'timestamp': datetime.now().isoformat(),
            'message': message
        })
    
    def _generate_formatted_report(self, report):
        """Generate HTML report for regulatory submission"""
        # Implementation would generate formatted HTML/PDF report
        pass


# Example usage
if __name__ == "__main__":
    # Initialize validator
    validator = RegulatoryCompliantValidator(
        regulatory_standard='all',
        device_class='IIa'
    )
    
    # Load your data
    X, y, patient_ids, demographics = load_medical_data()
    
    # Split data (test set is locked away)
    X_dev, y_dev, patient_ids_dev = validator.split_data(
        X, y, patient_ids, demographics
    )
    
    # Run cross-validation
    from sklearn.ensemble import RandomForestClassifier
    model = RandomForestClassifier(random_state=42)
    
    cv_results = validator.run_cross_validation(
        X_dev, y_dev, patient_ids_dev, model
    )
    
    # Subgroup analysis (FDA requirement)
    subgroup_results = validator.evaluate_subgroups(
        X_dev, y_dev, patient_ids_dev, demographics, model
    )
    
    # Final model training on all development data
    model.fit(X_dev, y_dev)
    
    # Final test set evaluation
    test_results = validator.final_test_evaluation(model)
    
    # Generate regulatory report
    report = validator.generate_regulatory_report()
    
    print("Validation complete. Report generated: regulatory_validation_report.json")
                
checklist

Compliance Checklists

Ensure your validation meets all requirements

FDA 510(k) Submission Checklist

CE MDR Technical Documentation Checklist

TRIPOD+AI Reporting Checklist

library_books

Official Sources & References

Direct links to regulatory documents and guidelines

FDA Resources

CE MDR & MDCG Resources

TRIPOD+AI Resources

Additional Resources

Submission Readiness Checklist

Use this interactive checklist to evaluate your validation strategy against regulatory expectations (FDA/CE MDR/TRIPOD+AI).

🛡️ Data Integrity & Leakage

📊 Method Justification

📝 Documentation

📥 Download Resources

Get templates and checklists for your regulatory submission

description Validation Plan Template checklist Compliance Checklist code Python Code Templates