Spaces:
Runtime error
Runtime error
| # Risk Models Specification | |
| This document outlines the requirements and specifications for implementing risk models in the Sentinel cancer risk assessment system. | |
| ## Overview | |
| Risk models in Sentinel are designed to calculate cancer risk scores using structured user input data. All risk models must follow a consistent architecture, use the new `UserInput` structure, implement proper validation, and maintain comprehensive test coverage. | |
| ## Core Architecture | |
| ### Base Class | |
| All risk models must inherit from `RiskModel` in `src/sentinel/risk_models/base.py`: | |
| ```python | |
| from sentinel.risk_models.base import RiskModel | |
| class YourRiskModel(RiskModel): | |
| def __init__(self): | |
| super().__init__("your_model_name") | |
| ``` | |
| ### Required Methods | |
| Every risk model must implement these abstract methods: | |
| ```python | |
| def compute_score(self, user: UserInput) -> str: | |
| """Compute the risk score for a given user profile. | |
| Args: | |
| user: The user profile containing demographics, medical history, etc. | |
| Returns: | |
| str: Risk percentage as a string or an N/A message if inapplicable. | |
| Raises: | |
| ValueError: If required inputs are missing or invalid. | |
| """ | |
| def cancer_type(self) -> str: | |
| """Return the cancer type this model assesses.""" | |
| return "breast" # or "lung", "prostate", etc. | |
| def description(self) -> str: | |
| """Return a detailed description of the model.""" | |
| def interpretation(self) -> str: | |
| """Return guidance on how to interpret the results.""" | |
| def references(self) -> list[str]: | |
| """Return list of reference citations.""" | |
| ``` | |
| ## UserInput Structure | |
| ### Required Imports | |
| ```python | |
| from typing import Annotated | |
| from pydantic import Field | |
| from sentinel.risk_models.base import RiskModel | |
| from sentinel.user_input import ( | |
| # Import specific enums and models you need | |
| CancerType, | |
| ChronicCondition, | |
| Demographics, | |
| Ethnicity, | |
| FamilyMemberCancer, | |
| FamilyRelation, | |
| FamilySide, | |
| RelationshipDegree, | |
| Sex, | |
| SymptomEntry, | |
| UserInput, | |
| # ... other specific imports | |
| ) | |
| ``` | |
| ### UserInput Hierarchy | |
| The `UserInput` class follows a hierarchical structure: | |
| ``` | |
| UserInput | |
| βββ demographics: Demographics | |
| β βββ age_years: int | |
| β βββ sex: Sex (enum) | |
| β βββ ethnicity: Ethnicity | None | |
| β βββ anthropometrics: Anthropometrics | |
| β βββ height_cm: float | None | |
| β βββ weight_kg: float | None | |
| βββ lifestyle: Lifestyle | |
| β βββ smoking: SmokingHistory | |
| β βββ alcohol: AlcoholConsumption | |
| βββ personal_medical_history: PersonalMedicalHistory | |
| β βββ chronic_conditions: list[ChronicCondition] | |
| β βββ previous_cancers: list[CancerType] | |
| β βββ genetic_mutations: list[GeneticMutation] | |
| β βββ tyrer_cuzick_polygenic_risk_score: float | None | |
| β βββ # ... other fields | |
| βββ female_specific: FemaleSpecific | None | |
| β βββ menstrual: MenstrualHistory | |
| β βββ parity: ParityHistory | |
| β βββ breast_health: BreastHealthHistory | |
| βββ symptoms: list[SymptomEntry] | |
| βββ family_history: list[FamilyMemberCancer] | |
| ``` | |
| ## REQUIRED_INPUTS Specification | |
| ### Structure | |
| Every risk model must define a `REQUIRED_INPUTS` class attribute using Pydantic's `Annotated` types with `Field` constraints: | |
| ```python | |
| REQUIRED_INPUTS: dict[str, tuple[type, bool]] = { | |
| "demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True), | |
| "demographics.sex": (Sex, True), | |
| "demographics.ethnicity": (Ethnicity | None, False), | |
| "demographics.anthropometrics.height_cm": (Annotated[float, Field(gt=0)], False), | |
| "demographics.anthropometrics.weight_kg": (Annotated[float, Field(gt=0)], False), | |
| "female_specific.menstrual.age_at_menarche": (Annotated[int, Field(ge=8, le=25)], False), | |
| "personal_medical_history.tyrer_cuzick_polygenic_risk_score": (Annotated[float, Field(gt=0)], False), | |
| "family_history": (list, False), # list[FamilyMemberCancer] | |
| "symptoms": (list, False), # list[SymptomEntry] | |
| } | |
| ``` | |
| ### Field Constraints | |
| Use appropriate `Field` constraints for validation: | |
| - `ge=X`: Greater than or equal to X | |
| - `le=X`: Less than or equal to X | |
| - `gt=X`: Greater than X | |
| - `lt=X`: Less than X | |
| ### Required vs Optional | |
| - `True`: Field is required for the model | |
| - `False`: Field is optional but validated if present | |
| ## Input Validation | |
| ### Validation in compute_score | |
| Every `compute_score` method must start with input validation: | |
| ```python | |
| def compute_score(self, user: UserInput) -> str: | |
| """Compute the risk score for a given user profile.""" | |
| # Validate inputs first | |
| is_valid, errors = self.validate_inputs(user) | |
| if not is_valid: | |
| raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}") | |
| # Continue with model-specific logic... | |
| ``` | |
| ### Model-Specific Validation | |
| Add additional validation as needed: | |
| ```python | |
| # Check sex applicability | |
| if user.demographics.sex != Sex.FEMALE: | |
| return "N/A: Model is only applicable to female patients." | |
| # Check age range | |
| if not (35 <= user.demographics.age_years <= 85): | |
| return "N/A: Age is outside the validated range." | |
| # Check required data availability | |
| if user.female_specific is None: | |
| return "N/A: Missing female-specific information required for model." | |
| ``` | |
| ## Extending UserInput | |
| ### When to Extend | |
| If a risk model requires fields or enums that don't exist in `UserInput`, **do not** use replacement values or hacks. Instead, propose extending `UserInput`: | |
| 1. **Missing Enums**: Add new values to existing enums (e.g., `ChronicCondition`, `SymptomType`) | |
| 2. **Missing Fields**: Add new fields to appropriate sections (e.g., `PersonalMedicalHistory`, `BreastHealthHistory`) | |
| 3. **Missing Models**: Create new Pydantic models if needed | |
| ### Extension Process | |
| 1. **Identify Missing Elements**: Document what's needed for the model | |
| 2. **Propose Extension**: Suggest specific additions to `UserInput` | |
| 3. **Implement Extension**: Add the new fields/enums to `src/sentinel/user_input.py` | |
| 4. **Update Tests**: Add tests for new fields in `tests/test_user_input.py` | |
| 5. **Update Model**: Use the new fields in your risk model | |
| 6. **Run Tests**: Ensure all tests pass | |
| ### Example Extensions | |
| ```python | |
| # Adding new ChronicCondition enum values | |
| class ChronicCondition(str, Enum): | |
| # ... existing values | |
| ENDOMETRIAL_POLYPS = "endometrial_polyps" | |
| ANAEMIA = "anaemia" | |
| # Adding new fields to PersonalMedicalHistory | |
| class PersonalMedicalHistory(StrictBaseModel): | |
| # ... existing fields | |
| tyrer_cuzick_polygenic_risk_score: float | None = Field( | |
| None, | |
| gt=0, | |
| description="Tyrer-Cuzick polygenic risk score as relative risk multiplier", | |
| ) | |
| # Adding new fields to BreastHealthHistory | |
| class BreastHealthHistory(StrictBaseModel): | |
| # ... existing fields | |
| lobular_carcinoma_in_situ: bool | None = Field( | |
| None, | |
| description="History of lobular carcinoma in situ (LCIS) diagnosis", | |
| ) | |
| ``` | |
| ## Data Access Patterns | |
| ### Demographics | |
| ```python | |
| age = user.demographics.age_years | |
| sex = user.demographics.sex | |
| ethnicity = user.demographics.ethnicity | |
| height_cm = user.demographics.anthropometrics.height_cm | |
| weight_kg = user.demographics.anthropometrics.weight_kg | |
| ``` | |
| ### Female-Specific Data | |
| ```python | |
| if user.female_specific is not None: | |
| fs = user.female_specific | |
| menarche_age = fs.menstrual.age_at_menarche | |
| menopause_age = fs.menstrual.age_at_menopause | |
| num_births = fs.parity.num_live_births | |
| first_birth_age = fs.parity.age_at_first_live_birth | |
| num_biopsies = fs.breast_health.num_biopsies | |
| atypical_hyperplasia = fs.breast_health.atypical_hyperplasia | |
| lcis = fs.breast_health.lobular_carcinoma_in_situ | |
| ``` | |
| ### Medical History | |
| ```python | |
| chronic_conditions = user.personal_medical_history.chronic_conditions | |
| previous_cancers = user.personal_medical_history.previous_cancers | |
| genetic_mutations = user.personal_medical_history.genetic_mutations | |
| polygenic_score = user.personal_medical_history.tyrer_cuzick_polygenic_risk_score | |
| ``` | |
| ### Family History | |
| ```python | |
| for member in user.family_history: | |
| if member.cancer_type == CancerType.BREAST: | |
| relation = member.relation | |
| age_at_diagnosis = member.age_at_diagnosis | |
| degree = member.degree | |
| side = member.side | |
| ``` | |
| ### Symptoms | |
| ```python | |
| for symptom in user.symptoms: | |
| symptom_type = symptom.symptom_type | |
| severity = symptom.severity | |
| duration_days = symptom.duration_days | |
| ``` | |
| ## Enum Usage | |
| ### Always Use Enums | |
| Never use string literals. Always use the appropriate enums: | |
| ```python | |
| # β Correct | |
| if user.demographics.sex == Sex.FEMALE: | |
| if member.cancer_type == CancerType.BREAST: | |
| if member.relation == FamilyRelation.MOTHER: | |
| if member.degree == RelationshipDegree.FIRST: | |
| if member.side == FamilySide.MATERNAL: | |
| # β Incorrect | |
| if user.demographics.sex == "female": | |
| if member.cancer_type == "breast": | |
| if member.relation == "mother": | |
| ``` | |
| ### Enum Mapping | |
| When you need to map enums to model-specific codes: | |
| ```python | |
| def _race_code_from_ethnicity(ethnicity: Ethnicity | None) -> int: | |
| """Map ethnicity enum to model-specific race code.""" | |
| if not ethnicity: | |
| return 1 # Default | |
| if ethnicity == Ethnicity.BLACK: | |
| return 2 | |
| if ethnicity in {Ethnicity.ASIAN, Ethnicity.PACIFIC_ISLANDER}: | |
| return 3 | |
| if ethnicity == Ethnicity.HISPANIC: | |
| return 6 | |
| return 1 # Default to White | |
| ``` | |
| ## Testing Requirements | |
| ### Test File Structure | |
| Create comprehensive test files following this pattern: | |
| ```python | |
| import pytest | |
| from sentinel.user_input import ( | |
| # Import all needed models and enums | |
| Anthropometrics, | |
| BreastHealthHistory, | |
| CancerType, | |
| Demographics, | |
| Ethnicity, | |
| FamilyMemberCancer, | |
| FamilyRelation, | |
| FamilySide, | |
| FemaleSpecific, | |
| Lifestyle, | |
| MenstrualHistory, | |
| ParityHistory, | |
| PersonalMedicalHistory, | |
| RelationshipDegree, | |
| Sex, | |
| SmokingHistory, | |
| SmokingStatus, | |
| UserInput, | |
| ) | |
| from sentinel.risk_models import YourRiskModel | |
| # Ground truth test cases | |
| GROUND_TRUTH_CASES = [ | |
| { | |
| "name": "test_case_name", | |
| "input": UserInput( | |
| demographics=Demographics( | |
| age_years=40, | |
| sex=Sex.FEMALE, | |
| ethnicity=Ethnicity.WHITE, | |
| anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0), | |
| ), | |
| lifestyle=Lifestyle( | |
| smoking=SmokingHistory(status=SmokingStatus.NEVER), | |
| ), | |
| personal_medical_history=PersonalMedicalHistory(), | |
| female_specific=FemaleSpecific( | |
| menstrual=MenstrualHistory(age_at_menarche=13), | |
| parity=ParityHistory(num_live_births=1, age_at_first_live_birth=25), | |
| breast_health=BreastHealthHistory(), | |
| ), | |
| family_history=[ | |
| FamilyMemberCancer( | |
| relation=FamilyRelation.MOTHER, | |
| cancer_type=CancerType.BREAST, | |
| age_at_diagnosis=55, | |
| degree=RelationshipDegree.FIRST, | |
| side=FamilySide.MATERNAL, | |
| ) | |
| ], | |
| ), | |
| "expected": 1.5, # Expected risk percentage | |
| }, | |
| # ... more test cases | |
| ] | |
| class TestYourRiskModel: | |
| """Test suite for YourRiskModel.""" | |
| def setup_method(self): | |
| """Initialize model instance for testing.""" | |
| self.model = YourRiskModel() | |
| @pytest.mark.parametrize("case", GROUND_TRUTH_CASES, ids=lambda x: x["name"]) | |
| def test_ground_truth_validation(self, case): | |
| """Test against ground truth results.""" | |
| user_input = case["input"] | |
| expected_risk = case["expected"] | |
| actual_risk_str = self.model.compute_score(user_input) | |
| if "N/A" in actual_risk_str: | |
| pytest.fail(f"Model returned N/A: {actual_risk_str}") | |
| actual_risk = float(actual_risk_str) | |
| assert actual_risk == pytest.approx(expected_risk, abs=0.01) | |
| def test_validation_errors(self): | |
| """Test that model raises ValueError for invalid inputs.""" | |
| # Test invalid age | |
| user_input = UserInput( | |
| demographics=Demographics( | |
| age_years=30, # Below minimum | |
| sex=Sex.FEMALE, | |
| anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0), | |
| ), | |
| # ... rest of input | |
| ) | |
| with pytest.raises(ValueError, match=r"Invalid inputs for.*:"): | |
| self.model.compute_score(user_input) | |
| def test_inapplicable_cases(self): | |
| """Test cases where model returns N/A.""" | |
| # Test male patient | |
| user_input = UserInput( | |
| demographics=Demographics( | |
| age_years=50, | |
| sex=Sex.MALE, # Wrong sex | |
| anthropometrics=Anthropometrics(height_cm=175.0, weight_kg=70.0), | |
| ), | |
| # ... rest of input | |
| ) | |
| score = self.model.compute_score(user_input) | |
| assert "N/A" in score | |
| ``` | |
| ### Test Coverage Requirements | |
| - **Ground Truth Validation**: Test against known reference values | |
| - **Input Validation**: Test that invalid inputs raise `ValueError` | |
| - **Edge Cases**: Test boundary conditions and edge cases | |
| - **Inapplicable Cases**: Test cases where model should return "N/A" | |
| - **Enum Usage**: Test that all enums are used correctly | |
| - **Family History**: Test various family relationship combinations | |
| - **Error Handling**: Test error conditions and exception handling | |
| ## Code Quality Requirements | |
| ### Pre-commit Hooks | |
| All code must pass these pre-commit hooks: | |
| - **unimport**: Remove unused imports | |
| - **ruff format**: Code formatting | |
| - **ruff check**: Linting and style checks | |
| - **pylint**: Code quality analysis | |
| - **darglint**: Docstring validation | |
| - **pydocstyle**: Docstring style checks | |
| - **codespell**: Spell checking | |
| ### Code Style | |
| - Use type hints throughout | |
| - Write clear, concise docstrings | |
| - Follow PEP 8 style guidelines | |
| - Use meaningful variable names | |
| - Add comments for complex logic | |
| - Handle edge cases gracefully | |
| ### Error Handling | |
| ```python | |
| def compute_score(self, user: UserInput) -> str: | |
| """Compute the risk score for a given user profile.""" | |
| try: | |
| # Validate inputs | |
| is_valid, errors = self.validate_inputs(user) | |
| if not is_valid: | |
| raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}") | |
| # Model-specific validation | |
| if user.demographics.sex != Sex.FEMALE: | |
| return "N/A: Model is only applicable to female patients." | |
| # Calculate risk | |
| risk = self._calculate_risk(user) | |
| return f"{risk:.2f}" | |
| except Exception as e: | |
| return f"N/A: Error calculating risk - {e!s}" | |
| ``` | |
| ## Migration Checklist | |
| When adapting an existing risk model to the new structure: | |
| - [ ] Update imports to use new `user_input` module | |
| - [ ] Add `REQUIRED_INPUTS` with Pydantic validation | |
| - [ ] Refactor `compute_score` to use new `UserInput` structure | |
| - [ ] Replace string literals with enums | |
| - [ ] Update parameter extraction logic | |
| - [ ] Add input validation at start of `compute_score` | |
| - [ ] Update all test cases to use new `UserInput` structure | |
| - [ ] Run full test suite to ensure 100% pass rate | |
| - [ ] Run pre-commit hooks to ensure code quality | |
| - [ ] Document any `UserInput` extensions needed | |
| - [ ] Update model documentation and references | |
| ## Examples | |
| ### Complete Risk Model Template | |
| ```python | |
| """Your cancer risk model implementation.""" | |
| from typing import Annotated | |
| from pydantic import Field | |
| from sentinel.risk_models.base import RiskModel | |
| from sentinel.user_input import ( | |
| CancerType, | |
| Demographics, | |
| Ethnicity, | |
| FamilyMemberCancer, | |
| FamilyRelation, | |
| RelationshipDegree, | |
| Sex, | |
| UserInput, | |
| ) | |
| class YourRiskModel(RiskModel): | |
| """Compute cancer risk using the Your model.""" | |
| def __init__(self): | |
| super().__init__("your_model") | |
| REQUIRED_INPUTS: dict[str, tuple[type, bool]] = { | |
| "demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True), | |
| "demographics.sex": (Sex, True), | |
| "demographics.ethnicity": (Ethnicity | None, False), | |
| "family_history": (list, False), # list[FamilyMemberCancer] | |
| } | |
| def compute_score(self, user: UserInput) -> str: | |
| """Compute the risk score for a given user profile.""" | |
| # Validate inputs first | |
| is_valid, errors = self.validate_inputs(user) | |
| if not is_valid: | |
| raise ValueError(f"Invalid inputs for Your: {'; '.join(errors)}") | |
| # Model-specific validation | |
| if user.demographics.sex != Sex.FEMALE: | |
| return "N/A: Model is only applicable to female patients." | |
| # Extract parameters | |
| age = user.demographics.age_years | |
| ethnicity = user.demographics.ethnicity | |
| # Count family history | |
| family_count = sum( | |
| 1 for member in user.family_history | |
| if member.cancer_type == CancerType.BREAST | |
| and member.degree == RelationshipDegree.FIRST | |
| ) | |
| # Calculate risk (example) | |
| risk = self._calculate_risk(age, family_count, ethnicity) | |
| return f"{risk:.2f}" | |
| def _calculate_risk(self, age: int, family_count: int, ethnicity: Ethnicity | None) -> float: | |
| """Calculate the actual risk value.""" | |
| # Implementation here | |
| return 1.5 # Example | |
| def cancer_type(self) -> str: | |
| return "breast" | |
| def description(self) -> str: | |
| return "Your model description here." | |
| def interpretation(self) -> str: | |
| return "Interpretation guidance here." | |
| def references(self) -> list[str]: | |
| return ["Your reference here."] | |
| ``` | |
| This specification ensures consistency, maintainability, and quality across all risk models in the Sentinel system. | |