[Check the website]( https://ki-smile.github.io/healthprocessai/website/
Developed at SMAILE (Stockholm Medical Artificial Intelligence and Learning Environments), Karolinska Institutet
HealthProcessAI is a comprehensive dual-language (Python & R) framework for applying process mining techniques to healthcare data, with integrated AI capabilities for generating clinical insights.
HealthProcessAI provides parallel implementations in both Python and R, allowing researchers and practitioners to choose their preferred environment while maintaining consistent methodology and results.
# Clone repository
git clone https://github.com/ki-smile/HealthProcessAI.git
cd HealthProcessAI
# Create conda environment (choose one):
# Option A: Latest compatible versions
conda env create -f environment.yml
conda activate healthprocessai
# Option B: Stable fixed versions (recommended if you encounter issues)
conda env create -f environment_stable.yml
conda activate healthprocessai_stable
# IMPORTANT: Always run from the repository root directory
# Use the -m flag to ensure proper module resolution (RECOMMENDED)
python -m examples.complete_pipeline_example
# Alternative: Run directly (may cause import errors)
# python examples/complete_pipeline_example.py
# Run modern R example (NEW: Full 5-step pipeline)
Rscript R/examples/complete_pipeline_example.R
pip install -r requirements.txt
# IMPORTANT: Always run from the repository root directory using -m flag
python -m examples.complete_pipeline_example
# Note: If you get import errors, see docs/IMPORT_FIX.md for solutions
# Install all required packages
source("requirements.R")
# Run the modern R implementation
source("R/examples/complete_pipeline_example.R")
# Example with custom data
pipeline <- CompleteProcessMiningPipeline$new(
data_path = "data/your_data.csv",
api_key = Sys.getenv("OPENROUTER_API_KEY"),
output_dir = "./results_R"
)
results <- pipeline$run_complete_analysis()
| Feature | Python | R | Notes |
|---|---|---|---|
| Process Discovery | PM4PY | bupaR | Both support full process mining |
| Data Size | Large (64-bit) | Medium-Large | Both handle healthcare datasets |
| Visualization | Graphviz | processmapR | R has interactive options |
| LLM Integration | ✅ | ✅ | Same OpenRouter API, multiple models |
| Report Generation | MD/HTML/PDF | MD/HTML/PDF/Word | R adds Word format |
| Report Orchestration | ✅ | ✅ | NEW: Both support multi-model synthesis |
| Advanced Analytics | ✅ | ✅ | Clustering, bottlenecks, predictions |
| Performance | Fast | Good | Python slightly faster on large data |
| Learning Curve | Moderate | Easier | R more intuitive for analysts |
healthprocessai/
├── 📂 core/ # Python core modules (5-step pipeline)
├── 📂 R/ # R implementation (NEW: Full pipeline)
│ ├── 📂 core/ # R core modules (5-step pipeline)
│ ├── 📂 examples/ # R example scripts
│ └── 📂 tests/ # R test suite
├── 📂 examples/ # Python example scripts
├── 📂 tutorials/ # Learning tutorials
├── 📂 notebooks/ # Jupyter & Colab notebooks
├── 📂 data/ # Sample datasets
├── 📂 docs/ # Technical documentation
├── 📂 tests/ # Python test suites
├── 📂 reports/ # Generated reports
└── 📂 legacy_original/ # Original R implementation
HealthProcessAI now includes an Orchestrator feature that consolidates insights from multiple LLM models into unified, comprehensive reports. The orchestrator:
from core.step5_orchestrator import ReportOrchestrator
# Initialize orchestrator
orchestrator = ReportOrchestrator(api_key=your_api_key)
# Consolidate multiple reports
orchestrated_report = orchestrator.consolidate_reports(
reports={
'anthropic': anthropic_report,
'deepseek': deepseek_report,
'google': gemini_report,
'openai': gpt4_report,
'xai': grok_report
},
case_info={
'title': 'Sepsis Progression Analysis',
'description': 'Multi-model synthesis of sepsis pathways'
}
)
# Save orchestrated report
with open('reports/orchestrated/consolidated_analysis.md', 'w') as f:
f.write(orchestrated_report)
from core.step1_data_loader import EventLogLoader
from core.step2_process_mining import ProcessMiner
from core.step3_llm_integration import LLMAnalyzer
from core.report_generator import ReportGenerator
from core.step5_orchestrator import ReportOrchestrator
# Load data
loader = EventLogLoader("data/sepsis_events.csv")
data = loader.load_and_prepare()
# Discover process
miner = ProcessMiner()
event_log = miner.create_event_log(data)
dfg = miner.discover_dfg()
# Generate insights
analyzer = LLMAnalyzer(api_key)
insights = analyzer.analyze(dfg)
# Create report
generator = ReportGenerator()
generator.generate_report(results, formats=['pdf'])
# NEW: Orchestrate multiple model reports (Step 5)
orchestrator = ReportOrchestrator()
orchestrated = orchestrator.consolidate_reports(
reports={'anthropic': insights1, 'gemini': insights2, ...},
case_info={'title': 'Sepsis Progression'}
)
library(tidyverse)
library(bupaR)
library(R6)
# Source modern R modules
source("R/core/step1_data_loader.R")
source("R/core/step2_process_mining.R")
source("R/core/step3_llm_integration.R")
source("R/core/step4_advanced_analytics.R")
source("R/core/step5_orchestrator.R")
source("R/core/report_generator.R")
# Initialize pipeline
pipeline <- CompleteProcessMiningPipeline$new(
data_path = "data/sepsis_events.csv",
api_key = Sys.getenv("OPENROUTER_API_KEY"),
output_dir = "./results"
)
# Run complete analysis with all steps
results <- pipeline$run_complete_analysis(
sepsis_only = TRUE,
use_llm = TRUE,
llm_models = c("anthropic", "deepseek", "google")
)
# Results include:
# - Process discovery with bupaR
# - Advanced analytics (clustering, bottlenecks, KPIs)
# - Multi-model LLM insights
# - Orchestrated report synthesis
# - Multiple output formats (MD, HTML, PDF, Word)
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE for details.
If you use HealthProcessAI in your research, please cite:
@software{healthprocessai2024,
title = {HealthProcessAI: Process Mining Framework for Healthcare},
author = {Abtahi, Farhad and Illueca Fernandez, Eduardo and Chen, Kaile},
organization = {SMAILE Lab, Karolinska Institutet},
year = {2024},
url = {https://github.com/ki-smile/HealthProcessAI}
}
Developed with ❤️ at SMAILE (Stockholm Medical Artificial Intelligence and Learning Environments), Karolinska Institutet