sentinel / RISK_MODELS.md
jeuko's picture
Sync from GitHub (main)
8018595 verified
# Risk Models Specification
This document outlines the requirements and specifications for implementing risk models in the Sentinel cancer risk assessment system.
## Overview
Risk models in Sentinel are designed to calculate cancer risk scores using structured user input data. All risk models must follow a consistent architecture, use the new `UserInput` structure, implement proper validation, and maintain comprehensive test coverage.
## Core Architecture
### Base Class
All risk models must inherit from `RiskModel` in `src/sentinel/risk_models/base.py`:
```python
from sentinel.risk_models.base import RiskModel
class YourRiskModel(RiskModel):
def __init__(self):
super().__init__("your_model_name")
```
### Required Methods
Every risk model must implement these abstract methods:
```python
def compute_score(self, user: UserInput) -> str:
"""Compute the risk score for a given user profile.
Args:
user: The user profile containing demographics, medical history, etc.
Returns:
str: Risk percentage as a string or an N/A message if inapplicable.
Raises:
ValueError: If required inputs are missing or invalid.
"""
def cancer_type(self) -> str:
"""Return the cancer type this model assesses."""
return "breast" # or "lung", "prostate", etc.
def description(self) -> str:
"""Return a detailed description of the model."""
def interpretation(self) -> str:
"""Return guidance on how to interpret the results."""
def references(self) -> list[str]:
"""Return list of reference citations."""
```
## UserInput Structure
### Required Imports
```python
from typing import Annotated
from pydantic import Field
from sentinel.risk_models.base import RiskModel
from sentinel.user_input import (
# Import specific enums and models you need
CancerType,
ChronicCondition,
Demographics,
Ethnicity,
FamilyMemberCancer,
FamilyRelation,
FamilySide,
RelationshipDegree,
Sex,
SymptomEntry,
UserInput,
# ... other specific imports
)
```
### UserInput Hierarchy
The `UserInput` class follows a hierarchical structure:
```
UserInput
β”œβ”€β”€ demographics: Demographics
β”‚ β”œβ”€β”€ age_years: int
β”‚ β”œβ”€β”€ sex: Sex (enum)
β”‚ β”œβ”€β”€ ethnicity: Ethnicity | None
β”‚ └── anthropometrics: Anthropometrics
β”‚ β”œβ”€β”€ height_cm: float | None
β”‚ └── weight_kg: float | None
β”œβ”€β”€ lifestyle: Lifestyle
β”‚ β”œβ”€β”€ smoking: SmokingHistory
β”‚ └── alcohol: AlcoholConsumption
β”œβ”€β”€ personal_medical_history: PersonalMedicalHistory
β”‚ β”œβ”€β”€ chronic_conditions: list[ChronicCondition]
β”‚ β”œβ”€β”€ previous_cancers: list[CancerType]
β”‚ β”œβ”€β”€ genetic_mutations: list[GeneticMutation]
β”‚ β”œβ”€β”€ tyrer_cuzick_polygenic_risk_score: float | None
β”‚ └── # ... other fields
β”œβ”€β”€ female_specific: FemaleSpecific | None
β”‚ β”œβ”€β”€ menstrual: MenstrualHistory
β”‚ β”œβ”€β”€ parity: ParityHistory
β”‚ └── breast_health: BreastHealthHistory
β”œβ”€β”€ symptoms: list[SymptomEntry]
└── family_history: list[FamilyMemberCancer]
```
## REQUIRED_INPUTS Specification
### Structure
Every risk model must define a `REQUIRED_INPUTS` class attribute using Pydantic's `Annotated` types with `Field` constraints:
```python
REQUIRED_INPUTS: dict[str, tuple[type, bool]] = {
"demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True),
"demographics.sex": (Sex, True),
"demographics.ethnicity": (Ethnicity | None, False),
"demographics.anthropometrics.height_cm": (Annotated[float, Field(gt=0)], False),
"demographics.anthropometrics.weight_kg": (Annotated[float, Field(gt=0)], False),
"female_specific.menstrual.age_at_menarche": (Annotated[int, Field(ge=8, le=25)], False),
"personal_medical_history.tyrer_cuzick_polygenic_risk_score": (Annotated[float, Field(gt=0)], False),
"family_history": (list, False), # list[FamilyMemberCancer]
"symptoms": (list, False), # list[SymptomEntry]
}
```
### Field Constraints
Use appropriate `Field` constraints for validation:
- `ge=X`: Greater than or equal to X
- `le=X`: Less than or equal to X
- `gt=X`: Greater than X
- `lt=X`: Less than X
### Required vs Optional
- `True`: Field is required for the model
- `False`: Field is optional but validated if present
## Input Validation
### Validation in compute_score
Every `compute_score` method must start with input validation:
```python
def compute_score(self, user: UserInput) -> str:
"""Compute the risk score for a given user profile."""
# Validate inputs first
is_valid, errors = self.validate_inputs(user)
if not is_valid:
raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}")
# Continue with model-specific logic...
```
### Model-Specific Validation
Add additional validation as needed:
```python
# Check sex applicability
if user.demographics.sex != Sex.FEMALE:
return "N/A: Model is only applicable to female patients."
# Check age range
if not (35 <= user.demographics.age_years <= 85):
return "N/A: Age is outside the validated range."
# Check required data availability
if user.female_specific is None:
return "N/A: Missing female-specific information required for model."
```
## Extending UserInput
### When to Extend
If a risk model requires fields or enums that don't exist in `UserInput`, **do not** use replacement values or hacks. Instead, propose extending `UserInput`:
1. **Missing Enums**: Add new values to existing enums (e.g., `ChronicCondition`, `SymptomType`)
2. **Missing Fields**: Add new fields to appropriate sections (e.g., `PersonalMedicalHistory`, `BreastHealthHistory`)
3. **Missing Models**: Create new Pydantic models if needed
### Extension Process
1. **Identify Missing Elements**: Document what's needed for the model
2. **Propose Extension**: Suggest specific additions to `UserInput`
3. **Implement Extension**: Add the new fields/enums to `src/sentinel/user_input.py`
4. **Update Tests**: Add tests for new fields in `tests/test_user_input.py`
5. **Update Model**: Use the new fields in your risk model
6. **Run Tests**: Ensure all tests pass
### Example Extensions
```python
# Adding new ChronicCondition enum values
class ChronicCondition(str, Enum):
# ... existing values
ENDOMETRIAL_POLYPS = "endometrial_polyps"
ANAEMIA = "anaemia"
# Adding new fields to PersonalMedicalHistory
class PersonalMedicalHistory(StrictBaseModel):
# ... existing fields
tyrer_cuzick_polygenic_risk_score: float | None = Field(
None,
gt=0,
description="Tyrer-Cuzick polygenic risk score as relative risk multiplier",
)
# Adding new fields to BreastHealthHistory
class BreastHealthHistory(StrictBaseModel):
# ... existing fields
lobular_carcinoma_in_situ: bool | None = Field(
None,
description="History of lobular carcinoma in situ (LCIS) diagnosis",
)
```
## Data Access Patterns
### Demographics
```python
age = user.demographics.age_years
sex = user.demographics.sex
ethnicity = user.demographics.ethnicity
height_cm = user.demographics.anthropometrics.height_cm
weight_kg = user.demographics.anthropometrics.weight_kg
```
### Female-Specific Data
```python
if user.female_specific is not None:
fs = user.female_specific
menarche_age = fs.menstrual.age_at_menarche
menopause_age = fs.menstrual.age_at_menopause
num_births = fs.parity.num_live_births
first_birth_age = fs.parity.age_at_first_live_birth
num_biopsies = fs.breast_health.num_biopsies
atypical_hyperplasia = fs.breast_health.atypical_hyperplasia
lcis = fs.breast_health.lobular_carcinoma_in_situ
```
### Medical History
```python
chronic_conditions = user.personal_medical_history.chronic_conditions
previous_cancers = user.personal_medical_history.previous_cancers
genetic_mutations = user.personal_medical_history.genetic_mutations
polygenic_score = user.personal_medical_history.tyrer_cuzick_polygenic_risk_score
```
### Family History
```python
for member in user.family_history:
if member.cancer_type == CancerType.BREAST:
relation = member.relation
age_at_diagnosis = member.age_at_diagnosis
degree = member.degree
side = member.side
```
### Symptoms
```python
for symptom in user.symptoms:
symptom_type = symptom.symptom_type
severity = symptom.severity
duration_days = symptom.duration_days
```
## Enum Usage
### Always Use Enums
Never use string literals. Always use the appropriate enums:
```python
# βœ… Correct
if user.demographics.sex == Sex.FEMALE:
if member.cancer_type == CancerType.BREAST:
if member.relation == FamilyRelation.MOTHER:
if member.degree == RelationshipDegree.FIRST:
if member.side == FamilySide.MATERNAL:
# ❌ Incorrect
if user.demographics.sex == "female":
if member.cancer_type == "breast":
if member.relation == "mother":
```
### Enum Mapping
When you need to map enums to model-specific codes:
```python
def _race_code_from_ethnicity(ethnicity: Ethnicity | None) -> int:
"""Map ethnicity enum to model-specific race code."""
if not ethnicity:
return 1 # Default
if ethnicity == Ethnicity.BLACK:
return 2
if ethnicity in {Ethnicity.ASIAN, Ethnicity.PACIFIC_ISLANDER}:
return 3
if ethnicity == Ethnicity.HISPANIC:
return 6
return 1 # Default to White
```
## Testing Requirements
### Test File Structure
Create comprehensive test files following this pattern:
```python
import pytest
from sentinel.user_input import (
# Import all needed models and enums
Anthropometrics,
BreastHealthHistory,
CancerType,
Demographics,
Ethnicity,
FamilyMemberCancer,
FamilyRelation,
FamilySide,
FemaleSpecific,
Lifestyle,
MenstrualHistory,
ParityHistory,
PersonalMedicalHistory,
RelationshipDegree,
Sex,
SmokingHistory,
SmokingStatus,
UserInput,
)
from sentinel.risk_models import YourRiskModel
# Ground truth test cases
GROUND_TRUTH_CASES = [
{
"name": "test_case_name",
"input": UserInput(
demographics=Demographics(
age_years=40,
sex=Sex.FEMALE,
ethnicity=Ethnicity.WHITE,
anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0),
),
lifestyle=Lifestyle(
smoking=SmokingHistory(status=SmokingStatus.NEVER),
),
personal_medical_history=PersonalMedicalHistory(),
female_specific=FemaleSpecific(
menstrual=MenstrualHistory(age_at_menarche=13),
parity=ParityHistory(num_live_births=1, age_at_first_live_birth=25),
breast_health=BreastHealthHistory(),
),
family_history=[
FamilyMemberCancer(
relation=FamilyRelation.MOTHER,
cancer_type=CancerType.BREAST,
age_at_diagnosis=55,
degree=RelationshipDegree.FIRST,
side=FamilySide.MATERNAL,
)
],
),
"expected": 1.5, # Expected risk percentage
},
# ... more test cases
]
class TestYourRiskModel:
"""Test suite for YourRiskModel."""
def setup_method(self):
"""Initialize model instance for testing."""
self.model = YourRiskModel()
@pytest.mark.parametrize("case", GROUND_TRUTH_CASES, ids=lambda x: x["name"])
def test_ground_truth_validation(self, case):
"""Test against ground truth results."""
user_input = case["input"]
expected_risk = case["expected"]
actual_risk_str = self.model.compute_score(user_input)
if "N/A" in actual_risk_str:
pytest.fail(f"Model returned N/A: {actual_risk_str}")
actual_risk = float(actual_risk_str)
assert actual_risk == pytest.approx(expected_risk, abs=0.01)
def test_validation_errors(self):
"""Test that model raises ValueError for invalid inputs."""
# Test invalid age
user_input = UserInput(
demographics=Demographics(
age_years=30, # Below minimum
sex=Sex.FEMALE,
anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0),
),
# ... rest of input
)
with pytest.raises(ValueError, match=r"Invalid inputs for.*:"):
self.model.compute_score(user_input)
def test_inapplicable_cases(self):
"""Test cases where model returns N/A."""
# Test male patient
user_input = UserInput(
demographics=Demographics(
age_years=50,
sex=Sex.MALE, # Wrong sex
anthropometrics=Anthropometrics(height_cm=175.0, weight_kg=70.0),
),
# ... rest of input
)
score = self.model.compute_score(user_input)
assert "N/A" in score
```
### Test Coverage Requirements
- **Ground Truth Validation**: Test against known reference values
- **Input Validation**: Test that invalid inputs raise `ValueError`
- **Edge Cases**: Test boundary conditions and edge cases
- **Inapplicable Cases**: Test cases where model should return "N/A"
- **Enum Usage**: Test that all enums are used correctly
- **Family History**: Test various family relationship combinations
- **Error Handling**: Test error conditions and exception handling
## Code Quality Requirements
### Pre-commit Hooks
All code must pass these pre-commit hooks:
- **unimport**: Remove unused imports
- **ruff format**: Code formatting
- **ruff check**: Linting and style checks
- **pylint**: Code quality analysis
- **darglint**: Docstring validation
- **pydocstyle**: Docstring style checks
- **codespell**: Spell checking
### Code Style
- Use type hints throughout
- Write clear, concise docstrings
- Follow PEP 8 style guidelines
- Use meaningful variable names
- Add comments for complex logic
- Handle edge cases gracefully
### Error Handling
```python
def compute_score(self, user: UserInput) -> str:
"""Compute the risk score for a given user profile."""
try:
# Validate inputs
is_valid, errors = self.validate_inputs(user)
if not is_valid:
raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}")
# Model-specific validation
if user.demographics.sex != Sex.FEMALE:
return "N/A: Model is only applicable to female patients."
# Calculate risk
risk = self._calculate_risk(user)
return f"{risk:.2f}"
except Exception as e:
return f"N/A: Error calculating risk - {e!s}"
```
## Migration Checklist
When adapting an existing risk model to the new structure:
- [ ] Update imports to use new `user_input` module
- [ ] Add `REQUIRED_INPUTS` with Pydantic validation
- [ ] Refactor `compute_score` to use new `UserInput` structure
- [ ] Replace string literals with enums
- [ ] Update parameter extraction logic
- [ ] Add input validation at start of `compute_score`
- [ ] Update all test cases to use new `UserInput` structure
- [ ] Run full test suite to ensure 100% pass rate
- [ ] Run pre-commit hooks to ensure code quality
- [ ] Document any `UserInput` extensions needed
- [ ] Update model documentation and references
## Examples
### Complete Risk Model Template
```python
"""Your cancer risk model implementation."""
from typing import Annotated
from pydantic import Field
from sentinel.risk_models.base import RiskModel
from sentinel.user_input import (
CancerType,
Demographics,
Ethnicity,
FamilyMemberCancer,
FamilyRelation,
RelationshipDegree,
Sex,
UserInput,
)
class YourRiskModel(RiskModel):
"""Compute cancer risk using the Your model."""
def __init__(self):
super().__init__("your_model")
REQUIRED_INPUTS: dict[str, tuple[type, bool]] = {
"demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True),
"demographics.sex": (Sex, True),
"demographics.ethnicity": (Ethnicity | None, False),
"family_history": (list, False), # list[FamilyMemberCancer]
}
def compute_score(self, user: UserInput) -> str:
"""Compute the risk score for a given user profile."""
# Validate inputs first
is_valid, errors = self.validate_inputs(user)
if not is_valid:
raise ValueError(f"Invalid inputs for Your: {'; '.join(errors)}")
# Model-specific validation
if user.demographics.sex != Sex.FEMALE:
return "N/A: Model is only applicable to female patients."
# Extract parameters
age = user.demographics.age_years
ethnicity = user.demographics.ethnicity
# Count family history
family_count = sum(
1 for member in user.family_history
if member.cancer_type == CancerType.BREAST
and member.degree == RelationshipDegree.FIRST
)
# Calculate risk (example)
risk = self._calculate_risk(age, family_count, ethnicity)
return f"{risk:.2f}"
def _calculate_risk(self, age: int, family_count: int, ethnicity: Ethnicity | None) -> float:
"""Calculate the actual risk value."""
# Implementation here
return 1.5 # Example
def cancer_type(self) -> str:
return "breast"
def description(self) -> str:
return "Your model description here."
def interpretation(self) -> str:
return "Interpretation guidance here."
def references(self) -> list[str]:
return ["Your reference here."]
```
This specification ensures consistency, maintainability, and quality across all risk models in the Sentinel system.