Spaces:

InstaDeepAI
/

sentinel

Runtime error

File size: 17,862 Bytes
# Risk Models Specification

This document outlines the requirements and specifications for implementing risk models in the Sentinel cancer risk assessment system.

## Overview

Risk models in Sentinel are designed to calculate cancer risk scores using structured user input data. All risk models must follow a consistent architecture, use the new `UserInput` structure, implement proper validation, and maintain comprehensive test coverage.

## Core Architecture

### Base Class

All risk models must inherit from `RiskModel` in `src/sentinel/risk_models/base.py`:

```python
from sentinel.risk_models.base import RiskModel

class YourRiskModel(RiskModel):
    def __init__(self):
        super().__init__("your_model_name")
```

### Required Methods

Every risk model must implement these abstract methods:

```python
def compute_score(self, user: UserInput) -> str:
    """Compute the risk score for a given user profile.

    Args:
        user: The user profile containing demographics, medical history, etc.

    Returns:
        str: Risk percentage as a string or an N/A message if inapplicable.

    Raises:
        ValueError: If required inputs are missing or invalid.
    """

def cancer_type(self) -> str:
    """Return the cancer type this model assesses."""
    return "breast"  # or "lung", "prostate", etc.

def description(self) -> str:
    """Return a detailed description of the model."""

def interpretation(self) -> str:
    """Return guidance on how to interpret the results."""

def references(self) -> list[str]:
    """Return list of reference citations."""
```

## UserInput Structure

### Required Imports

```python
from typing import Annotated
from pydantic import Field
from sentinel.risk_models.base import RiskModel
from sentinel.user_input import (
    # Import specific enums and models you need
    CancerType,
    ChronicCondition,
    Demographics,
    Ethnicity,
    FamilyMemberCancer,
    FamilyRelation,
    FamilySide,
    RelationshipDegree,
    Sex,
    SymptomEntry,
    UserInput,
    # ... other specific imports
)
```

### UserInput Hierarchy

The `UserInput` class follows a hierarchical structure:

```
UserInput
├── demographics: Demographics
│   ├── age_years: int
│   ├── sex: Sex (enum)
│   ├── ethnicity: Ethnicity | None
│   └── anthropometrics: Anthropometrics
│       ├── height_cm: float | None
│       └── weight_kg: float | None
├── lifestyle: Lifestyle
│   ├── smoking: SmokingHistory
│   └── alcohol: AlcoholConsumption
├── personal_medical_history: PersonalMedicalHistory
│   ├── chronic_conditions: list[ChronicCondition]
│   ├── previous_cancers: list[CancerType]
│   ├── genetic_mutations: list[GeneticMutation]
│   ├── tyrer_cuzick_polygenic_risk_score: float | None
│   └── # ... other fields
├── female_specific: FemaleSpecific | None
│   ├── menstrual: MenstrualHistory
│   ├── parity: ParityHistory
│   └── breast_health: BreastHealthHistory
├── symptoms: list[SymptomEntry]
└── family_history: list[FamilyMemberCancer]
```

## REQUIRED_INPUTS Specification

### Structure

Every risk model must define a `REQUIRED_INPUTS` class attribute using Pydantic's `Annotated` types with `Field` constraints:

```python
REQUIRED_INPUTS: dict[str, tuple[type, bool]] = {
    "demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True),
    "demographics.sex": (Sex, True),
    "demographics.ethnicity": (Ethnicity | None, False),
    "demographics.anthropometrics.height_cm": (Annotated[float, Field(gt=0)], False),
    "demographics.anthropometrics.weight_kg": (Annotated[float, Field(gt=0)], False),
    "female_specific.menstrual.age_at_menarche": (Annotated[int, Field(ge=8, le=25)], False),
    "personal_medical_history.tyrer_cuzick_polygenic_risk_score": (Annotated[float, Field(gt=0)], False),
    "family_history": (list, False),  # list[FamilyMemberCancer]
    "symptoms": (list, False),  # list[SymptomEntry]
}
```

### Field Constraints

Use appropriate `Field` constraints for validation:

- `ge=X`: Greater than or equal to X
- `le=X`: Less than or equal to X
- `gt=X`: Greater than X
- `lt=X`: Less than X

### Required vs Optional

- `True`: Field is required for the model
- `False`: Field is optional but validated if present

## Input Validation

### Validation in compute_score

Every `compute_score` method must start with input validation:

```python
def compute_score(self, user: UserInput) -> str:
    """Compute the risk score for a given user profile."""
    # Validate inputs first
    is_valid, errors = self.validate_inputs(user)
    if not is_valid:
        raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}")

    # Continue with model-specific logic...
```

### Model-Specific Validation

Add additional validation as needed:

```python
# Check sex applicability
if user.demographics.sex != Sex.FEMALE:
    return "N/A: Model is only applicable to female patients."

# Check age range
if not (35 <= user.demographics.age_years <= 85):
    return "N/A: Age is outside the validated range."

# Check required data availability
if user.female_specific is None:
    return "N/A: Missing female-specific information required for model."
```

## Extending UserInput

### When to Extend

If a risk model requires fields or enums that don't exist in `UserInput`, **do not** use replacement values or hacks. Instead, propose extending `UserInput`:

1. **Missing Enums**: Add new values to existing enums (e.g., `ChronicCondition`, `SymptomType`)
2. **Missing Fields**: Add new fields to appropriate sections (e.g., `PersonalMedicalHistory`, `BreastHealthHistory`)
3. **Missing Models**: Create new Pydantic models if needed

### Extension Process

1. **Identify Missing Elements**: Document what's needed for the model
2. **Propose Extension**: Suggest specific additions to `UserInput`
3. **Implement Extension**: Add the new fields/enums to `src/sentinel/user_input.py`
4. **Update Tests**: Add tests for new fields in `tests/test_user_input.py`
5. **Update Model**: Use the new fields in your risk model
6. **Run Tests**: Ensure all tests pass

### Example Extensions

```python
# Adding new ChronicCondition enum values
class ChronicCondition(str, Enum):
    # ... existing values
    ENDOMETRIAL_POLYPS = "endometrial_polyps"
    ANAEMIA = "anaemia"

# Adding new fields to PersonalMedicalHistory
class PersonalMedicalHistory(StrictBaseModel):
    # ... existing fields
    tyrer_cuzick_polygenic_risk_score: float | None = Field(
        None,
        gt=0,
        description="Tyrer-Cuzick polygenic risk score as relative risk multiplier",
    )

# Adding new fields to BreastHealthHistory
class BreastHealthHistory(StrictBaseModel):
    # ... existing fields
    lobular_carcinoma_in_situ: bool | None = Field(
        None,
        description="History of lobular carcinoma in situ (LCIS) diagnosis",
    )
```

## Data Access Patterns

### Demographics

```python
age = user.demographics.age_years
sex = user.demographics.sex
ethnicity = user.demographics.ethnicity
height_cm = user.demographics.anthropometrics.height_cm
weight_kg = user.demographics.anthropometrics.weight_kg
```

### Female-Specific Data

```python
if user.female_specific is not None:
    fs = user.female_specific
    menarche_age = fs.menstrual.age_at_menarche
    menopause_age = fs.menstrual.age_at_menopause
    num_births = fs.parity.num_live_births
    first_birth_age = fs.parity.age_at_first_live_birth
    num_biopsies = fs.breast_health.num_biopsies
    atypical_hyperplasia = fs.breast_health.atypical_hyperplasia
    lcis = fs.breast_health.lobular_carcinoma_in_situ
```

### Medical History

```python
chronic_conditions = user.personal_medical_history.chronic_conditions
previous_cancers = user.personal_medical_history.previous_cancers
genetic_mutations = user.personal_medical_history.genetic_mutations
polygenic_score = user.personal_medical_history.tyrer_cuzick_polygenic_risk_score
```

### Family History

```python
for member in user.family_history:
    if member.cancer_type == CancerType.BREAST:
        relation = member.relation
        age_at_diagnosis = member.age_at_diagnosis
        degree = member.degree
        side = member.side
```

### Symptoms

```python
for symptom in user.symptoms:
    symptom_type = symptom.symptom_type
    severity = symptom.severity
    duration_days = symptom.duration_days
```

## Enum Usage

### Always Use Enums

Never use string literals. Always use the appropriate enums:

```python
# ✅ Correct
if user.demographics.sex == Sex.FEMALE:
if member.cancer_type == CancerType.BREAST:
if member.relation == FamilyRelation.MOTHER:
if member.degree == RelationshipDegree.FIRST:
if member.side == FamilySide.MATERNAL:

# ❌ Incorrect
if user.demographics.sex == "female":
if member.cancer_type == "breast":
if member.relation == "mother":
```

### Enum Mapping

When you need to map enums to model-specific codes:

```python
def _race_code_from_ethnicity(ethnicity: Ethnicity | None) -> int:
    """Map ethnicity enum to model-specific race code."""
    if not ethnicity:
        return 1  # Default

    if ethnicity == Ethnicity.BLACK:
        return 2
    if ethnicity in {Ethnicity.ASIAN, Ethnicity.PACIFIC_ISLANDER}:
        return 3
    if ethnicity == Ethnicity.HISPANIC:
        return 6
    return 1  # Default to White
```

## Testing Requirements

### Test File Structure

Create comprehensive test files following this pattern:

```python
import pytest
from sentinel.user_input import (
    # Import all needed models and enums
    Anthropometrics,
    BreastHealthHistory,
    CancerType,
    Demographics,
    Ethnicity,
    FamilyMemberCancer,
    FamilyRelation,
    FamilySide,
    FemaleSpecific,
    Lifestyle,
    MenstrualHistory,
    ParityHistory,
    PersonalMedicalHistory,
    RelationshipDegree,
    Sex,
    SmokingHistory,
    SmokingStatus,
    UserInput,
)
from sentinel.risk_models import YourRiskModel

# Ground truth test cases
GROUND_TRUTH_CASES = [
    {
        "name": "test_case_name",
        "input": UserInput(
            demographics=Demographics(
                age_years=40,
                sex=Sex.FEMALE,
                ethnicity=Ethnicity.WHITE,
                anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0),
            ),
            lifestyle=Lifestyle(
                smoking=SmokingHistory(status=SmokingStatus.NEVER),
            ),
            personal_medical_history=PersonalMedicalHistory(),
            female_specific=FemaleSpecific(
                menstrual=MenstrualHistory(age_at_menarche=13),
                parity=ParityHistory(num_live_births=1, age_at_first_live_birth=25),
                breast_health=BreastHealthHistory(),
            ),
            family_history=[
                FamilyMemberCancer(
                    relation=FamilyRelation.MOTHER,
                    cancer_type=CancerType.BREAST,
                    age_at_diagnosis=55,
                    degree=RelationshipDegree.FIRST,
                    side=FamilySide.MATERNAL,
                )
            ],
        ),
        "expected": 1.5,  # Expected risk percentage
    },
    # ... more test cases
]

class TestYourRiskModel:
    """Test suite for YourRiskModel."""

    def setup_method(self):
        """Initialize model instance for testing."""
        self.model = YourRiskModel()

    @pytest.mark.parametrize("case", GROUND_TRUTH_CASES, ids=lambda x: x["name"])
    def test_ground_truth_validation(self, case):
        """Test against ground truth results."""
        user_input = case["input"]
        expected_risk = case["expected"]

        actual_risk_str = self.model.compute_score(user_input)

        if "N/A" in actual_risk_str:
            pytest.fail(f"Model returned N/A: {actual_risk_str}")

        actual_risk = float(actual_risk_str)
        assert actual_risk == pytest.approx(expected_risk, abs=0.01)

    def test_validation_errors(self):
        """Test that model raises ValueError for invalid inputs."""
        # Test invalid age
        user_input = UserInput(
            demographics=Demographics(
                age_years=30,  # Below minimum
                sex=Sex.FEMALE,
                anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0),
            ),
            # ... rest of input
        )

        with pytest.raises(ValueError, match=r"Invalid inputs for.*:"):
            self.model.compute_score(user_input)

    def test_inapplicable_cases(self):
        """Test cases where model returns N/A."""
        # Test male patient
        user_input = UserInput(
            demographics=Demographics(
                age_years=50,
                sex=Sex.MALE,  # Wrong sex
                anthropometrics=Anthropometrics(height_cm=175.0, weight_kg=70.0),
            ),
            # ... rest of input
        )

        score = self.model.compute_score(user_input)
        assert "N/A" in score
```

### Test Coverage Requirements

- **Ground Truth Validation**: Test against known reference values
- **Input Validation**: Test that invalid inputs raise `ValueError`
- **Edge Cases**: Test boundary conditions and edge cases
- **Inapplicable Cases**: Test cases where model should return "N/A"
- **Enum Usage**: Test that all enums are used correctly
- **Family History**: Test various family relationship combinations
- **Error Handling**: Test error conditions and exception handling

## Code Quality Requirements

### Pre-commit Hooks

All code must pass these pre-commit hooks:

- **unimport**: Remove unused imports
- **ruff format**: Code formatting
- **ruff check**: Linting and style checks
- **pylint**: Code quality analysis
- **darglint**: Docstring validation
- **pydocstyle**: Docstring style checks
- **codespell**: Spell checking

### Code Style

- Use type hints throughout
- Write clear, concise docstrings
- Follow PEP 8 style guidelines
- Use meaningful variable names
- Add comments for complex logic
- Handle edge cases gracefully

### Error Handling

```python
def compute_score(self, user: UserInput) -> str:
    """Compute the risk score for a given user profile."""
    try:
        # Validate inputs
        is_valid, errors = self.validate_inputs(user)
        if not is_valid:
            raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}")

        # Model-specific validation
        if user.demographics.sex != Sex.FEMALE:
            return "N/A: Model is only applicable to female patients."

        # Calculate risk
        risk = self._calculate_risk(user)
        return f"{risk:.2f}"

    except Exception as e:
        return f"N/A: Error calculating risk - {e!s}"
```

## Migration Checklist

When adapting an existing risk model to the new structure:

- [ ] Update imports to use new `user_input` module
- [ ] Add `REQUIRED_INPUTS` with Pydantic validation
- [ ] Refactor `compute_score` to use new `UserInput` structure
- [ ] Replace string literals with enums
- [ ] Update parameter extraction logic
- [ ] Add input validation at start of `compute_score`
- [ ] Update all test cases to use new `UserInput` structure
- [ ] Run full test suite to ensure 100% pass rate
- [ ] Run pre-commit hooks to ensure code quality
- [ ] Document any `UserInput` extensions needed
- [ ] Update model documentation and references

## Examples

### Complete Risk Model Template

```python
"""Your cancer risk model implementation."""

from typing import Annotated
from pydantic import Field
from sentinel.risk_models.base import RiskModel
from sentinel.user_input import (
    CancerType,
    Demographics,
    Ethnicity,
    FamilyMemberCancer,
    FamilyRelation,
    RelationshipDegree,
    Sex,
    UserInput,
)

class YourRiskModel(RiskModel):
    """Compute cancer risk using the Your model."""

    def __init__(self):
        super().__init__("your_model")

    REQUIRED_INPUTS: dict[str, tuple[type, bool]] = {
        "demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True),
        "demographics.sex": (Sex, True),
        "demographics.ethnicity": (Ethnicity | None, False),
        "family_history": (list, False),  # list[FamilyMemberCancer]
    }

    def compute_score(self, user: UserInput) -> str:
        """Compute the risk score for a given user profile."""
        # Validate inputs first
        is_valid, errors = self.validate_inputs(user)
        if not is_valid:
            raise ValueError(f"Invalid inputs for Your: {'; '.join(errors)}")

        # Model-specific validation
        if user.demographics.sex != Sex.FEMALE:
            return "N/A: Model is only applicable to female patients."

        # Extract parameters
        age = user.demographics.age_years
        ethnicity = user.demographics.ethnicity

        # Count family history
        family_count = sum(
            1 for member in user.family_history
            if member.cancer_type == CancerType.BREAST
            and member.degree == RelationshipDegree.FIRST
        )

        # Calculate risk (example)
        risk = self._calculate_risk(age, family_count, ethnicity)
        return f"{risk:.2f}"

    def _calculate_risk(self, age: int, family_count: int, ethnicity: Ethnicity | None) -> float:
        """Calculate the actual risk value."""
        # Implementation here
        return 1.5  # Example

    def cancer_type(self) -> str:
        return "breast"

    def description(self) -> str:
        return "Your model description here."

    def interpretation(self) -> str:
        return "Interpretation guidance here."

    def references(self) -> list[str]:
        return ["Your reference here."]
```

This specification ensures consistency, maintainability, and quality across all risk models in the Sentinel system.