File size: 5,876 Bytes
78c7282
8018595
 
 
 
 
 
78c7282
 
 
8018595
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6b7a59
 
0ba176c
f6b7a59
 
 
 
 
 
0ba176c
f6b7a59
0ba176c
f6b7a59
8018595
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
title: Sentinel - Cancer Risk Assessment Assistant
emoji: 🏥
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 8501
pinned: false
---

# LLM-based Cancer Risk Assessment Assistant

This project is an API service that provides preliminary cancer risk assessments based on user-provided data. It is built using FastAPI and LangChain, with a flexible architecture that supports both local and API-based LLMs.

## Development Setup

1. Create the virtual environment:

```bash
uv sync
```

## External API Configuration

For risk models that require external APIs, such as CanRisk (BOADICEA model), fill in the following section of the `.env` file:

```bash
# .env
CANRISK_USERNAME=your_canrisk_username
CANRISK_PASSWORD=your_canrisk_password
```

Then source it: `source .env`

For CanRisk API access , register at https://www.canrisk.org/.

## Using a Local LLM (Ollama)

1. Install [Ollama](https://ollama.com) for your platform.
2. Pull the default model from the command line:

```bash
ollama pull gemma3:4b
```
3. Ensure the Ollama desktop app or server is running. You can check your installed models with `ollama list`.

## Using API-based LLMs (Google)

1. Create a `.env` file in the project root with your `GOOGLE_API_KEY`:

   ```bash
   echo "GOOGLE_API_KEY=your_key_here" > .env
   ```

   Make sure the Generative AI API is enabled for your Google Cloud project.

2. Run the command line demo with the Google provider (default):

   ```bash
   uv run python apps/cli/main.py
   ```

   Switch to the local model with:

   ```bash
   uv run python apps/cli/main.py model=gemma3_4b
   ```

3. The `model` override also works with the Streamlit and FastAPI interfaces.


## Interactive Demo

Run a simple command line demo with:

```bash
uv run python apps/cli/main.py
```

Enable developer mode and load user data from a file with:

```bash
uv run python apps/cli/main.py dev_mode=true user_file=examples/user_example.yaml
```

The script collects user data, prints the structured JSON assessment, and then allows follow-up questions in a chat-like loop. Type `quit` to exit.

The multi-page Streamlit interface provides an expert feedback interface located at
`apps/streamlit_ui/main.py`.
The first page, **User Profile**, lets you upload or manually create a profile
before running assessments.
The **Configuration** page allows you to choose the model and knowledge base modules and shows a live preview of the full LLM prompt.
The **Assessment** page runs the model, shows a dashboard of results, and lets you export or chat with the assistant.

### Exporting Reports

After the initial assessment is displayed in the terminal, you will be prompted to export the full report to a formatted file. You can choose to generate a PDF, an Excel file, or both. The generated files (e.g., `Cancer_Risk_Report_20250626_213000.pdf`) will be saved in the root directory of the project.

**Note:** This feature requires the `openpyxl` and `reportlab` libraries.

You can also provide a JSON or YAML file with all user information to skip the
interactive prompts:

```bash
uv run python apps/cli/main.py user_file=examples/user_example.yaml
```

To launch the Streamlit interface, run the following command from the root of the
project:

```bash
uv run streamlit run apps/streamlit_ui/main.py
```

*Note* To serve the app locally you can use `ngrok`
```bash
 ngrok http 8501
 ```

## Important Note for Developers

When making changes to the project, check if the following files should also updated to reflect the changes:

- `README.md`
- `AGENTS.md`
- `GEMINI.md`

## Available Risk Models

The assistant currently includes the following built-in risk calculators:

- **Gail** - Breast cancer risk
- **Claus** - Breast cancer risk based on family history
- **Tyrer-Cuzick** - Breast cancer risk (IBIS model)
- **BOADICEA** - Breast and ovarian cancer risk (via CanRisk API)
- **PLCOm2012** - Lung cancer risk
- **LLPi** - Liverpool Lung Project improved model for lung cancer risk (8.7-year prediction)
- **CRC-PRO** - Colorectal cancer risk
- **PCPT** - Prostate cancer risk
- **Extended PBCG** - Prostate cancer risk (extended model)
- **Prostate Mortality** - Prostate cancer-specific mortality prediction
- **MRAT** - Melanoma risk (5-year prediction)
- **aMAP** - Hepatocellular carcinoma (liver cancer) risk
- **QCancer** - Multi-site cancer differential

## Generating Documentation

The project includes a comprehensive PDF documentation generator that creates detailed documentation of all implemented risk models and their input requirements.

### Generate Risk Model Documentation

To generate the PDF documentation:

```bash
uv run python scripts/generate_documentation.py
```

This will create a comprehensive PDF document (`docs/risk_model_documentation.pdf`) that includes:

1. **Overview Section**:
   - Cancer type coverage chart
   - Statistics on implemented risk scores and cancer types covered

2. **Detailed Model Information**:
   - Description, interpretation, and references for each risk model
   - Complete input requirements with field details, required status, units, and possible values/choices

3. **Input-to-Cancer Mapping**:
   - Reverse mapping showing which cancer types use each input field
   - Possible values for each field
   - Comprehensive coverage analysis

The documentation is automatically regenerated based on the current codebase, ensuring it stays up-to-date as new risk models and input fields are added.

### Documentation Features

- **Comprehensive Coverage**: Documents all risk models and their input requirements
- **Visual Charts**: Includes cancer type coverage visualization
- **Detailed Tables**: Shows field specifications, constraints, and valid values
- **Professional Layout**: Clean, readable PDF format suitable for sharing
- **Auto-Generated**: Stays synchronized with code changes automatically