Spaces:

InstaDeepAI
/

sentinel

Runtime error

App Files Files Community

sentinel / RISK_MODELS.md

jeuko

Sync from GitHub (main)

8018595 verified about 2 months ago

preview code

raw

history blame contribute delete

17.9 kB

	# Risk Models Specification

	This document outlines the requirements and specifications for implementing risk models in the Sentinel cancer risk assessment system.

	## Overview

	Risk models in Sentinel are designed to calculate cancer risk scores using structured user input data. All risk models must follow a consistent architecture, use the new `UserInput` structure, implement proper validation, and maintain comprehensive test coverage.

	## Core Architecture

	### Base Class

	All risk models must inherit from `RiskModel` in `src/sentinel/risk_models/base.py`:

	```python
	from sentinel.risk_models.base import RiskModel

	class YourRiskModel(RiskModel):
	def __init__(self):
	super().__init__("your_model_name")
	```

	### Required Methods

	Every risk model must implement these abstract methods:

	```python
	def compute_score(self, user: UserInput) -> str:
	"""Compute the risk score for a given user profile.

	Args:
	user: The user profile containing demographics, medical history, etc.

	Returns:
	str: Risk percentage as a string or an N/A message if inapplicable.

	Raises:
	ValueError: If required inputs are missing or invalid.
	"""

	def cancer_type(self) -> str:
	"""Return the cancer type this model assesses."""
	return "breast" # or "lung", "prostate", etc.

	def description(self) -> str:
	"""Return a detailed description of the model."""

	def interpretation(self) -> str:
	"""Return guidance on how to interpret the results."""

	def references(self) -> list[str]:
	"""Return list of reference citations."""
	```

	## UserInput Structure

	### Required Imports

	```python
	from typing import Annotated
	from pydantic import Field
	from sentinel.risk_models.base import RiskModel
	from sentinel.user_input import (
	# Import specific enums and models you need
	CancerType,
	ChronicCondition,
	Demographics,
	Ethnicity,
	FamilyMemberCancer,
	FamilyRelation,
	FamilySide,
	RelationshipDegree,
	Sex,
	SymptomEntry,
	UserInput,
	# ... other specific imports
	)
	```

	### UserInput Hierarchy

	The `UserInput` class follows a hierarchical structure:

	```
	UserInput
	├── demographics: Demographics
	│ ├── age_years: int
	│ ├── sex: Sex (enum)
	│ ├── ethnicity: Ethnicity \| None
	│ └── anthropometrics: Anthropometrics
	│ ├── height_cm: float \| None
	│ └── weight_kg: float \| None
	├── lifestyle: Lifestyle
	│ ├── smoking: SmokingHistory
	│ └── alcohol: AlcoholConsumption
	├── personal_medical_history: PersonalMedicalHistory
	│ ├── chronic_conditions: list[ChronicCondition]
	│ ├── previous_cancers: list[CancerType]
	│ ├── genetic_mutations: list[GeneticMutation]
	│ ├── tyrer_cuzick_polygenic_risk_score: float \| None
	│ └── # ... other fields
	├── female_specific: FemaleSpecific \| None
	│ ├── menstrual: MenstrualHistory
	│ ├── parity: ParityHistory
	│ └── breast_health: BreastHealthHistory
	├── symptoms: list[SymptomEntry]
	└── family_history: list[FamilyMemberCancer]
	```

	## REQUIRED_INPUTS Specification

	### Structure

	Every risk model must define a `REQUIRED_INPUTS` class attribute using Pydantic's `Annotated` types with `Field` constraints:

	```python
	REQUIRED_INPUTS: dict[str, tuple[type, bool]] = {
	"demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True),
	"demographics.sex": (Sex, True),
	"demographics.ethnicity": (Ethnicity \| None, False),
	"demographics.anthropometrics.height_cm": (Annotated[float, Field(gt=0)], False),
	"demographics.anthropometrics.weight_kg": (Annotated[float, Field(gt=0)], False),
	"female_specific.menstrual.age_at_menarche": (Annotated[int, Field(ge=8, le=25)], False),
	"personal_medical_history.tyrer_cuzick_polygenic_risk_score": (Annotated[float, Field(gt=0)], False),
	"family_history": (list, False), # list[FamilyMemberCancer]
	"symptoms": (list, False), # list[SymptomEntry]
	}
	```

	### Field Constraints

	Use appropriate `Field` constraints for validation:

	- `ge=X`: Greater than or equal to X
	- `le=X`: Less than or equal to X
	- `gt=X`: Greater than X
	- `lt=X`: Less than X

	### Required vs Optional

	- `True`: Field is required for the model
	- `False`: Field is optional but validated if present

	## Input Validation

	### Validation in compute_score

	Every `compute_score` method must start with input validation:

	```python
	def compute_score(self, user: UserInput) -> str:
	"""Compute the risk score for a given user profile."""
	# Validate inputs first
	is_valid, errors = self.validate_inputs(user)
	if not is_valid:
	raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}")

	# Continue with model-specific logic...
	```

	### Model-Specific Validation

	Add additional validation as needed:

	```python
	# Check sex applicability
	if user.demographics.sex != Sex.FEMALE:
	return "N/A: Model is only applicable to female patients."

	# Check age range
	if not (35 <= user.demographics.age_years <= 85):
	return "N/A: Age is outside the validated range."

	# Check required data availability
	if user.female_specific is None:
	return "N/A: Missing female-specific information required for model."
	```

	## Extending UserInput

	### When to Extend

	If a risk model requires fields or enums that don't exist in `UserInput`, do not use replacement values or hacks. Instead, propose extending `UserInput`:

	1. Missing Enums: Add new values to existing enums (e.g., `ChronicCondition`, `SymptomType`)
	2. Missing Fields: Add new fields to appropriate sections (e.g., `PersonalMedicalHistory`, `BreastHealthHistory`)
	3. Missing Models: Create new Pydantic models if needed

	### Extension Process

	1. Identify Missing Elements: Document what's needed for the model
	2. Propose Extension: Suggest specific additions to `UserInput`
	3. Implement Extension: Add the new fields/enums to `src/sentinel/user_input.py`
	4. Update Tests: Add tests for new fields in `tests/test_user_input.py`
	5. Update Model: Use the new fields in your risk model
	6. Run Tests: Ensure all tests pass

	### Example Extensions

	```python
	# Adding new ChronicCondition enum values
	class ChronicCondition(str, Enum):
	# ... existing values
	ENDOMETRIAL_POLYPS = "endometrial_polyps"
	ANAEMIA = "anaemia"

	# Adding new fields to PersonalMedicalHistory
	class PersonalMedicalHistory(StrictBaseModel):
	# ... existing fields
	tyrer_cuzick_polygenic_risk_score: float \| None = Field(
	None,
	gt=0,
	description="Tyrer-Cuzick polygenic risk score as relative risk multiplier",
	)

	# Adding new fields to BreastHealthHistory
	class BreastHealthHistory(StrictBaseModel):
	# ... existing fields
	lobular_carcinoma_in_situ: bool \| None = Field(
	None,
	description="History of lobular carcinoma in situ (LCIS) diagnosis",
	)
	```

	## Data Access Patterns

	### Demographics

	```python
	age = user.demographics.age_years
	sex = user.demographics.sex
	ethnicity = user.demographics.ethnicity
	height_cm = user.demographics.anthropometrics.height_cm
	weight_kg = user.demographics.anthropometrics.weight_kg
	```

	### Female-Specific Data

	```python
	if user.female_specific is not None:
	fs = user.female_specific
	menarche_age = fs.menstrual.age_at_menarche
	menopause_age = fs.menstrual.age_at_menopause
	num_births = fs.parity.num_live_births
	first_birth_age = fs.parity.age_at_first_live_birth
	num_biopsies = fs.breast_health.num_biopsies
	atypical_hyperplasia = fs.breast_health.atypical_hyperplasia
	lcis = fs.breast_health.lobular_carcinoma_in_situ
	```

	### Medical History

	```python
	chronic_conditions = user.personal_medical_history.chronic_conditions
	previous_cancers = user.personal_medical_history.previous_cancers
	genetic_mutations = user.personal_medical_history.genetic_mutations
	polygenic_score = user.personal_medical_history.tyrer_cuzick_polygenic_risk_score
	```

	### Family History

	```python
	for member in user.family_history:
	if member.cancer_type == CancerType.BREAST:
	relation = member.relation
	age_at_diagnosis = member.age_at_diagnosis
	degree = member.degree
	side = member.side
	```

	### Symptoms

	```python
	for symptom in user.symptoms:
	symptom_type = symptom.symptom_type
	severity = symptom.severity
	duration_days = symptom.duration_days
	```

	## Enum Usage

	### Always Use Enums

	Never use string literals. Always use the appropriate enums:

	```python
	# ✅ Correct
	if user.demographics.sex == Sex.FEMALE:
	if member.cancer_type == CancerType.BREAST:
	if member.relation == FamilyRelation.MOTHER:
	if member.degree == RelationshipDegree.FIRST:
	if member.side == FamilySide.MATERNAL:

	# ❌ Incorrect
	if user.demographics.sex == "female":
	if member.cancer_type == "breast":
	if member.relation == "mother":
	```

	### Enum Mapping

	When you need to map enums to model-specific codes:

	```python
	def _race_code_from_ethnicity(ethnicity: Ethnicity \| None) -> int:
	"""Map ethnicity enum to model-specific race code."""
	if not ethnicity:
	return 1 # Default

	if ethnicity == Ethnicity.BLACK:
	return 2
	if ethnicity in {Ethnicity.ASIAN, Ethnicity.PACIFIC_ISLANDER}:
	return 3
	if ethnicity == Ethnicity.HISPANIC:
	return 6
	return 1 # Default to White
	```

	## Testing Requirements

	### Test File Structure

	Create comprehensive test files following this pattern:

	```python
	import pytest
	from sentinel.user_input import (
	# Import all needed models and enums
	Anthropometrics,
	BreastHealthHistory,
	CancerType,
	Demographics,
	Ethnicity,
	FamilyMemberCancer,
	FamilyRelation,
	FamilySide,
	FemaleSpecific,
	Lifestyle,
	MenstrualHistory,
	ParityHistory,
	PersonalMedicalHistory,
	RelationshipDegree,
	Sex,
	SmokingHistory,
	SmokingStatus,
	UserInput,
	)
	from sentinel.risk_models import YourRiskModel

	# Ground truth test cases
	GROUND_TRUTH_CASES = [
	{
	"name": "test_case_name",
	"input": UserInput(
	demographics=Demographics(
	age_years=40,
	sex=Sex.FEMALE,
	ethnicity=Ethnicity.WHITE,
	anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0),
	),
	lifestyle=Lifestyle(
	smoking=SmokingHistory(status=SmokingStatus.NEVER),
	),
	personal_medical_history=PersonalMedicalHistory(),
	female_specific=FemaleSpecific(
	menstrual=MenstrualHistory(age_at_menarche=13),
	parity=ParityHistory(num_live_births=1, age_at_first_live_birth=25),
	breast_health=BreastHealthHistory(),
	),
	family_history=[
	FamilyMemberCancer(
	relation=FamilyRelation.MOTHER,
	cancer_type=CancerType.BREAST,
	age_at_diagnosis=55,
	degree=RelationshipDegree.FIRST,
	side=FamilySide.MATERNAL,
	)
	],
	),
	"expected": 1.5, # Expected risk percentage
	},
	# ... more test cases
	]

	class TestYourRiskModel:
	"""Test suite for YourRiskModel."""

	def setup_method(self):
	"""Initialize model instance for testing."""
	self.model = YourRiskModel()

	@pytest.mark.parametrize("case", GROUND_TRUTH_CASES, ids=lambda x: x["name"])
	def test_ground_truth_validation(self, case):
	"""Test against ground truth results."""
	user_input = case["input"]
	expected_risk = case["expected"]

	actual_risk_str = self.model.compute_score(user_input)

	if "N/A" in actual_risk_str:
	pytest.fail(f"Model returned N/A: {actual_risk_str}")

	actual_risk = float(actual_risk_str)
	assert actual_risk == pytest.approx(expected_risk, abs=0.01)

	def test_validation_errors(self):
	"""Test that model raises ValueError for invalid inputs."""
	# Test invalid age
	user_input = UserInput(
	demographics=Demographics(
	age_years=30, # Below minimum
	sex=Sex.FEMALE,
	anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0),
	),
	# ... rest of input
	)

	with pytest.raises(ValueError, match=r"Invalid inputs for.*:"):
	self.model.compute_score(user_input)

	def test_inapplicable_cases(self):
	"""Test cases where model returns N/A."""
	# Test male patient
	user_input = UserInput(
	demographics=Demographics(
	age_years=50,
	sex=Sex.MALE, # Wrong sex
	anthropometrics=Anthropometrics(height_cm=175.0, weight_kg=70.0),
	),
	# ... rest of input
	)

	score = self.model.compute_score(user_input)
	assert "N/A" in score
	```

	### Test Coverage Requirements

	- Ground Truth Validation: Test against known reference values
	- Input Validation: Test that invalid inputs raise `ValueError`
	- Edge Cases: Test boundary conditions and edge cases
	- Inapplicable Cases: Test cases where model should return "N/A"
	- Enum Usage: Test that all enums are used correctly
	- Family History: Test various family relationship combinations
	- Error Handling: Test error conditions and exception handling

	## Code Quality Requirements

	### Pre-commit Hooks

	All code must pass these pre-commit hooks:

	- unimport: Remove unused imports
	- ruff format: Code formatting
	- ruff check: Linting and style checks
	- pylint: Code quality analysis
	- darglint: Docstring validation
	- pydocstyle: Docstring style checks
	- codespell: Spell checking

	### Code Style

	- Use type hints throughout
	- Write clear, concise docstrings
	- Follow PEP 8 style guidelines
	- Use meaningful variable names
	- Add comments for complex logic
	- Handle edge cases gracefully

	### Error Handling

	```python
	def compute_score(self, user: UserInput) -> str:
	"""Compute the risk score for a given user profile."""
	try:
	# Validate inputs
	is_valid, errors = self.validate_inputs(user)
	if not is_valid:
	raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}")

	# Model-specific validation
	if user.demographics.sex != Sex.FEMALE:
	return "N/A: Model is only applicable to female patients."

	# Calculate risk
	risk = self._calculate_risk(user)
	return f"{risk:.2f}"

	except Exception as e:
	return f"N/A: Error calculating risk - {e!s}"
	```

	## Migration Checklist

	When adapting an existing risk model to the new structure:

	- [ ] Update imports to use new `user_input` module
	- [ ] Add `REQUIRED_INPUTS` with Pydantic validation
	- [ ] Refactor `compute_score` to use new `UserInput` structure
	- [ ] Replace string literals with enums
	- [ ] Update parameter extraction logic
	- [ ] Add input validation at start of `compute_score`
	- [ ] Update all test cases to use new `UserInput` structure
	- [ ] Run full test suite to ensure 100% pass rate
	- [ ] Run pre-commit hooks to ensure code quality
	- [ ] Document any `UserInput` extensions needed
	- [ ] Update model documentation and references

	## Examples

	### Complete Risk Model Template

	```python
	"""Your cancer risk model implementation."""

	from typing import Annotated
	from pydantic import Field
	from sentinel.risk_models.base import RiskModel
	from sentinel.user_input import (
	CancerType,
	Demographics,
	Ethnicity,
	FamilyMemberCancer,
	FamilyRelation,
	RelationshipDegree,
	Sex,
	UserInput,
	)

	class YourRiskModel(RiskModel):
	"""Compute cancer risk using the Your model."""

	def __init__(self):
	super().__init__("your_model")

	REQUIRED_INPUTS: dict[str, tuple[type, bool]] = {
	"demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True),
	"demographics.sex": (Sex, True),
	"demographics.ethnicity": (Ethnicity \| None, False),
	"family_history": (list, False), # list[FamilyMemberCancer]
	}

	def compute_score(self, user: UserInput) -> str:
	"""Compute the risk score for a given user profile."""
	# Validate inputs first
	is_valid, errors = self.validate_inputs(user)
	if not is_valid:
	raise ValueError(f"Invalid inputs for Your: {'; '.join(errors)}")

	# Model-specific validation
	if user.demographics.sex != Sex.FEMALE:
	return "N/A: Model is only applicable to female patients."

	# Extract parameters
	age = user.demographics.age_years
	ethnicity = user.demographics.ethnicity

	# Count family history
	family_count = sum(
	1 for member in user.family_history
	if member.cancer_type == CancerType.BREAST
	and member.degree == RelationshipDegree.FIRST
	)

	# Calculate risk (example)
	risk = self._calculate_risk(age, family_count, ethnicity)
	return f"{risk:.2f}"

	def _calculate_risk(self, age: int, family_count: int, ethnicity: Ethnicity \| None) -> float:
	"""Calculate the actual risk value."""
	# Implementation here
	return 1.5 # Example

	def cancer_type(self) -> str:
	return "breast"

	def description(self) -> str:
	return "Your model description here."

	def interpretation(self) -> str:
	return "Interpretation guidance here."

	def references(self) -> list[str]:
	return ["Your reference here."]
	```

	This specification ensures consistency, maintainability, and quality across all risk models in the Sentinel system.