Spaces:

MCP-1st-Birthday
/

DeepBoner

Running

App Files Files Community

VibecoderMcSwaggins commited on 11 days ago

Commit

599a754

unverified ·

1 Parent(s): 3cb2e43

fix: P0 provider mismatch and code quality audit fixes (#102)

Browse files

* refactor: replace hacky try/except with proper pydantic validation

PROBLEM:
- CodeRabbit recommended try/except for ADVANCED_MAX_ROUNDS parsing
- This is hacky defensive programming, not Clean Code
- Silent fallbacks mask configuration errors

SOLUTION (Uncle Bob approved):
1. Add advanced_max_rounds to Settings with pydantic Field validation:
- ge=1, le=20 bounds checking
- Fails fast at startup with clear error message

2. Add advanced_timeout to Settings (60-900 seconds)

3. Remove os.getenv + try/except hack from advanced.py
- Now uses settings.advanced_max_rounds directly

4. Fix domain.py: invalid domain strings now raise ValueError
- Shows valid options in error message
- Previously silently fell back to default

5. Update tests: 18 tests verifying fail-fast behavior
- TestSettingsValidation: type errors, bounds violations all raise
- TestGetDomainConfig: invalid strings raise with helpful message

PHILOSOPHY:
"If someone configures your app wrong, tell them loudly.
Don't pretend everything is fine." - Uncle Bob

* fix: P0 provider mismatch and code quality audit fixes

CRITICAL BUG FIX:
- get_model() now auto-detects available providers (OpenAI > Anthropic > HuggingFace)
- Raises clear ConfigurationError when no API keys configured
- Free Tier synthesis properly falls back to template when no HF_TOKEN

CODE QUALITY FIXES (from audit):
- Replace manual os.getenv with centralized settings properties (app.py)
- Add logging to p-value parsing (statistical_analyzer.py) - fixes silent pass
- Narrow exception handling to specific errors (pubmed.py)
- Use find() instead of try/except for string search (code_execution.py)
- Use centralized settings for Modal credentials (code_execution.py)

TESTS:
- Add 5 new TDD tests for get_model() auto-detection
- Fix regression tests for updated settings usage
- All 309 tests pass

DOCUMENTATION:
- Add P0_SYNTHESIS_PROVIDER_MISMATCH.md with full fix documentation
- Add AUDIT_FINDINGS_2025_11_30.md code quality audit
- Update ACTIVE_BUGS.md index

* test: add explicit has_huggingface_key per CodeRabbit review

Files changed (16) hide show

docs/bugs/ACTIVE_BUGS.md +12 -1
docs/bugs/AUDIT_FINDINGS_2025_11_30.md +70 -0
docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md +273 -0
src/agent_factory/judges.py +25 -12
src/app.py +4 -4
src/config/domain.py +3 -2
src/orchestrators/advanced.py +5 -22
src/services/statistical_analyzer.py +5 -1
src/tools/code_execution.py +13 -9
src/tools/pubmed.py +6 -2
src/utils/config.py +13 -1
tests/unit/agent_factory/test_get_model_auto_detect.py +59 -0
tests/unit/agent_factory/test_judges_factory.py +7 -0
tests/unit/config/test_domain.py +8 -4
tests/unit/orchestrators/test_advanced_orchestrator.py +60 -49
tests/unit/test_app_domain.py +14 -2

docs/bugs/ACTIVE_BUGS.md CHANGED Viewed

@@ -3,10 +3,11 @@
 > Last updated: 2025-11-30
 >
 > **Note:** Completed bug docs archived to `docs/bugs/archive/`
 ## P0 - Blocker
-*(None - P0 bugs resolved)*
 ---
@@ -56,6 +57,16 @@
 ## Resolved Bugs
 ### ~~P0 - Simple Mode Never Synthesizes~~ FIXED
 **PR:** [#71](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/71) (SPEC_06)
 **Commit**: `5cac97d` (2025-11-29)

 > Last updated: 2025-11-30
 >
 > **Note:** Completed bug docs archived to `docs/bugs/archive/`
+> **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
 ## P0 - Blocker
+(None)
 ---
 ## Resolved Bugs
+### ~~P0 - Synthesis Fails with OpenAIError in Free Mode~~ FIXED
+**File:** `docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md`
+**Found:** 2025-11-30 (Code Audit)
+**Resolved:** 2025-11-30
+- Problem: "Simple Mode" (Free Tier) crashed with `OpenAIError`.
+- Root Cause: `get_model()` defaulted to OpenAI regardless of available keys.
+- Fix: Implemented auto-detection in `judges.py` (OpenAI > Anthropic > HuggingFace).
+- Added extensive unit tests and regression tests.
 ### ~~P0 - Simple Mode Never Synthesizes~~ FIXED
 **PR:** [#71](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/71) (SPEC_06)
 **Commit**: `5cac97d` (2025-11-29)

docs/bugs/AUDIT_FINDINGS_2025_11_30.md ADDED Viewed

	@@ -0,0 +1,70 @@

+# Code Quality Audit Findings - 2025-11-30
+**Auditor:** Senior Staff Engineer (Gemini)
+**Date:** 2025-11-30
+**Scope:** `src/` (services, tools, agents, orchestrators)
+**Focus:** Configuration validation, Error handling, Defensive programming anti-patterns
+## Summary
+The codebase is generally clean and modern, but exhibits specific anti-patterns related to configuration management and defensive error handling. The most critical finding is the reliance on manual `os.getenv` calls and "silent default" fallbacks which obscure configuration errors, directly contributing to the `OpenAIError` observed in production.
+## Findings
+### 1. Defensive Pass Block (Silent Failure) - MEDIUM
+**File:** `src/services/statistical_analyzer.py:246-247`
+```python
+            try:
+                min_p = min(float(p) for p in p_values)
+                # ... logic ...
+            except ValueError:
+                pass
+```
+**Problem:** If p-values are found by regex but fail to parse, the error is swallowed silently. This makes debugging parser issues impossible.
+**Fix:** Replace `pass` with `logger.warning("Failed to parse p-values: %s", p_values)` to aid debugging.
+### 2. Missing Pydantic Validation (Manual Config) - MEDIUM
+**File:** `src/tools/code_execution.py:75-76`
+```python
+        self.modal_token_id = os.getenv("MODAL_TOKEN_ID")
+        self.modal_token_secret = os.getenv("MODAL_TOKEN_SECRET")
+```
+**Problem:** Secrets are manually fetched from env vars, bypassing the centralized `Settings` validation.
+**Fix:** Move to `src/utils/config.py` in the `Settings` class and inject `settings` into `ModalCodeExecutor`.
+### 3. Broad Exception Swallowing - MEDIUM
+**File:** `src/tools/pubmed.py:129-130`
+```python
+            except Exception:
+                continue  # Skip malformed articles
+```
+**Problem:** Catching `Exception` hides potential bugs (like `NameError` or `TypeError` in our own code), not just malformed data.
+**Fix:** Catch specific exceptions (e.g., `(KeyError, AttributeError, TypeError)`) OR log the error before continuing: `logger.debug(f"Skipping malformed article {pmid}: {e}")`.
+### 4. Missing Pydantic Validation (UI Layer) - LOW
+**File:** `src/app.py:115, 119`
+```python
+    elif os.getenv("OPENAI_API_KEY"):
+        # ...
+    elif os.getenv("ANTHROPIC_API_KEY"):
+```
+**Problem:** Application logic relies on raw environment variable checks to determine available backends, creating duplication and potential inconsistency with `config.py`.
+**Fix:** Centralize this logic in `src/utils/config.py` (e.g., `settings.has_openai`, `settings.has_anthropic`).
+### 5. Try/Except for Flow Control - LOW
+**File:** `src/tools/code_execution.py:244-249`
+```python
+        try:
+            start_idx = text.index(start_marker) + len(start_marker)
+            # ...
+        except ValueError:
+            return text.strip()
+```
+**Problem:** Using exceptions for expected "not found" cases is slower and less explicit.
+**Fix:** Use `find()` which returns `-1` on failure.
+## Action Plan
+1.  **Refactor Configuration:** Eliminate `os.getenv` in favor of `src/utils/config.py` `Settings` model.
+2.  **Fix Error Handling:** Remove empty `pass` blocks; add logging.
+3.  **Address P0 Bug:** Fix the `OpenAIError` in synthesis (caused by Finding #4/General Config issue) by injecting the correct model into the orchestrator.

docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md ADDED Viewed

	@@ -0,0 +1,273 @@

+# P0 - Systemic Provider Mismatch Across All Modes
+**Status:** RESOLVED
+**Priority:** P0 (Blocker for Free Tier/Demo)
+**Found:** 2025-11-30 (during Audit)
+**Resolved:** 2025-11-30
+**Component:** Multiple files across orchestrators, agents, services
+## Resolution Summary
+The critical provider mismatch bug has been fixed by implementing auto-detection in `src/agent_factory/judges.py`.
+The `get_model()` function now checks for actual API key availability (`has_openai_key`, `has_anthropic_key`, `has_huggingface_key`)
+instead of relying on the static `settings.llm_provider` configuration.
+### Fix Details
+- **Auto-Detection Implemented**: `get_model()` prioritizes OpenAI > Anthropic > HuggingFace based on *available keys*.
+- **Fail-Fast on No Keys**: If no API keys are configured, `get_model()` raises `ConfigurationError` with clear message.
+- **HuggingFace Requires Token**: Free Tier via `HuggingFaceModel` requires `HF_TOKEN` (PydanticAI requirement).
+- **Synthesis Fallback**: When `get_model()` fails, synthesis gracefully falls back to template.
+- **Audit Fixes Applied**:
+    - Replaced manual `os.getenv` checks with centralized `settings` properties in `src/app.py`.
+    - Added logging to `src/services/statistical_analyzer.py` (fixed silent `pass`).
+    - Narrowed exception handling in `src/tools/pubmed.py`.
+    - Optimized string search in `src/tools/code_execution.py`.
+### Key Clarification
+The **Free Tier** in Simple Mode uses `HFInferenceJudgeHandler` (which uses `huggingface_hub.InferenceClient`)
+for judging - this does NOT require `HF_TOKEN`. However, synthesis via `get_model()` uses PydanticAI's
+`HuggingFaceModel` which DOES require `HF_TOKEN`. When no tokens are configured, synthesis falls back to
+the template-based summary (which is still useful).
+### Verification
+- **Unit Tests**: 5 new TDD tests in `tests/unit/agent_factory/test_get_model_auto_detect.py` pass.
+- **All Tests**: 309 tests pass (`make check` succeeds).
+- **Regression Tests**: Fixed and verified `tests/unit/agent_factory/test_judges_factory.py`.
+---
+## Symptom (Archive)
+When running in "Simple Mode" (Free Tier / No API Key), the synthesis step fails to generate a narrative and falls back to a structured summary template. The user sees:
+```text
+> ⚠️ Note: AI narrative synthesis unavailable. Showing structured summary.
+> _Error: OpenAIError_
+```
+## Affected Files (COMPREHENSIVE AUDIT)
+### Files Calling `get_model()` Directly (9 locations)
+| File | Line | Context | Impact |
+|------|------|---------|--------|
+| `simple.py` | 547 | Synthesis step | Free Tier broken |
+| `statistical_analyzer.py` | 75 | Analysis agent | Free Tier broken |
+| `judge_agent_llm.py` | 18 | LLM Judge | Free Tier broken |
+| `graph/nodes.py` | 177 | LangGraph hypothesis | Free Tier broken |
+| `graph/nodes.py` | 249 | LangGraph synthesis | Free Tier broken |
+| `report_agent.py` | 45 | Report generation | Free Tier broken |
+| `hypothesis_agent.py` | 44 | Hypothesis generation | Free Tier broken |
+| `judges.py` | 100 | JudgeHandler default | OK (accepts param) |
+### Files Hardcoding `OpenAIChatClient` (Architecturally OpenAI-Only)
+| File | Lines | Context |
+|------|-------|---------|
+| `advanced.py` | 100, 121 | Manager client |
+| `magentic_agents.py` | 29, 70, 129, 173 | All 4 agents |
+| `retrieval_agent.py` | 62 | Retrieval agent |
+| `code_executor_agent.py` | 52 | Code executor |
+| `llm_factory.py` | 42 | Factory default |
+**Note:** Advanced mode is architecturally locked to OpenAI via `agent_framework.openai.OpenAIChatClient`. This is by design - see `app.py:188-194` which falls back to Simple mode if no OpenAI key. However, users are not clearly informed of this limitation.
+## Root Cause
+**Settings/Runtime Sync Gap - Two Separate Backend Selection Systems.**
+The codebase has **two independent** systems for selecting the LLM backend:
+1. `settings.llm_provider` (config.py default: "openai")
+2. `app.py` runtime detection via `os.getenv()` checks
+These are **never synchronized**, causing the Judge and Synthesis steps to use different backends.
+### Detailed Call Chain
+1.  **`src/app.py:115-126`** (runtime detection):
+    ```python
+    # app.py bypasses settings entirely for JudgeHandler selection
+    elif os.getenv("OPENAI_API_KEY"):
+        judge_handler = JudgeHandler(model=None, domain=domain)
+    elif os.getenv("ANTHROPIC_API_KEY"):
+        judge_handler = JudgeHandler(model=None, domain=domain)
+    else:
+        judge_handler = HFInferenceJudgeHandler(domain=domain)  # Free Tier
+    ```
+    **Note:** This creates the correct handler but does NOT update `settings.llm_provider`.
+2.  **`src/orchestrators/simple.py:546-552`** (synthesis step):
+    ```python
+    from src.agent_factory.judges import get_model
+    agent: Agent[None, str] = Agent(model=get_model(), ...)  # <-- BUG!
+    ```
+    Synthesis calls `get_model()` directly instead of using the injected judge's model.
+3.  **`src/agent_factory/judges.py:56-78`** (`get_model()`):
+    ```python
+    def get_model() -> Any:
+        llm_provider = settings.llm_provider  # <-- Reads from settings (still "openai")
+        # ...
+        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)  # <-- None!
+        return OpenAIChatModel(settings.openai_model, provider=openai_provider)
+    ```
+    **Result:** Creates OpenAI model with `api_key=None` → `OpenAIError`
+### Why Free Tier Fails
+| Step | System Used | Backend Selected |
+|------|-------------|------------------|
+| JudgeHandler | `app.py` runtime | HFInferenceJudgeHandler ✅ |
+| Synthesis | `settings.llm_provider` | OpenAI (default) ❌ |
+The Judge works because app.py explicitly creates `HFInferenceJudgeHandler`.
+Synthesis fails because it calls `get_model()` which reads `settings.llm_provider = "openai"` (unchanged from default).
+## Impact
+-   **User Experience:** Free tier users (Demo users) never see the high-quality narrative synthesis, only the fallback.
+-   **System Integrity:** The orchestrator ignores the runtime backend selection.
+## Implemented Fix
+**Strategy: Fix `get_model()` to Auto-Detect Available Provider**
+### Actual Implementation (Merged)
+**File:** `src/agent_factory/judges.py`
+This is the **single point of fix** that resolves all 7 broken `get_model()` call sites.
+```python
+def get_model() -> Any:
+    """Get the LLM model based on available API keys.
+    Priority order:
+    1. OpenAI (if OPENAI_API_KEY set)
+    2. Anthropic (if ANTHROPIC_API_KEY set)
+    3. HuggingFace (if HF_TOKEN set)
+    Raises:
+        ConfigurationError: If no API keys are configured.
+    Note: settings.llm_provider is ignored in favor of actual key availability.
+    This ensures the model matches what app.py selected for JudgeHandler.
+    """
+    from src.utils.exceptions import ConfigurationError
+    # Priority 1: OpenAI (most common, best tool calling)
+    if settings.has_openai_key:
+        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
+        return OpenAIChatModel(settings.openai_model, provider=openai_provider)
+    # Priority 2: Anthropic
+    if settings.has_anthropic_key:
+        provider = AnthropicProvider(api_key=settings.anthropic_api_key)
+        return AnthropicModel(settings.anthropic_model, provider=provider)
+    # Priority 3: HuggingFace (requires HF_TOKEN)
+    if settings.has_huggingface_key:
+        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
+        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
+        return HuggingFaceModel(model_name, provider=hf_provider)
+    # No keys configured - fail fast with clear error
+    raise ConfigurationError(
+        "No LLM API key configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN"
+    )
+```
+**Why this works:**
+- Single fix location updates all 7 broken call sites
+- Matches app.py's detection logic (key availability, not settings.llm_provider)
+- HuggingFace works when HF_TOKEN is available
+- Raises clear error when no keys configured (callers can catch and fallback)
+- No changes needed to orchestrators, agents, or services
+### What This Does NOT Fix (By Design)
+**Advanced Mode remains OpenAI-only.** The following files use `agent_framework.openai.OpenAIChatClient` which only supports OpenAI:
+- `advanced.py` (Manager + agents)
+- `magentic_agents.py` (SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent)
+- `retrieval_agent.py`, `code_executor_agent.py`
+This is **by design** - the Microsoft Agent Framework library (`agent-framework-core`) only provides `OpenAIChatClient`. To support other providers in Advanced mode would require:
+1. Wait for `agent-framework` to add Anthropic/HuggingFace clients, OR
+2. Write our own `ChatClient` implementations (significant effort)
+**The current app.py behavior is correct:** it falls back to Simple mode when no OpenAI key is present (lines 188-194). The UI message could be clearer about why.
+## Test Plan (Implemented)
+### Unit Tests (Verified Passing)
+```python
+# tests/unit/agent_factory/test_get_model_auto_detect.py
+import pytest
+from src.agent_factory.judges import get_model
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+class TestGetModelAutoDetect:
+    """Test that get_model() auto-detects available providers."""
+    def test_returns_openai_when_key_present(self, monkeypatch):
+        """OpenAI key present → OpenAI model."""
+        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
+        monkeypatch.setattr(settings, "anthropic_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", None)
+        model = get_model()
+        assert isinstance(model, OpenAIChatModel)
+    def test_returns_anthropic_when_only_anthropic_key(self, monkeypatch):
+        """Only Anthropic key → Anthropic model."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
+        monkeypatch.setattr(settings, "hf_token", None)
+        model = get_model()
+        assert isinstance(model, AnthropicModel)
+    def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
+        """HF_TOKEN present (no paid keys) → HuggingFace model."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "anthropic_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", "hf_test_token")
+        model = get_model()
+        assert isinstance(model, HuggingFaceModel)
+    def test_raises_error_when_no_keys(self, monkeypatch):
+        """No keys at all → ConfigurationError."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "anthropic_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", None)
+        with pytest.raises(ConfigurationError) as exc_info:
+            get_model()
+        assert "No LLM API key configured" in str(exc_info.value)
+    def test_openai_takes_priority_over_anthropic(self, monkeypatch):
+        """Both keys present → OpenAI wins."""
+        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
+        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
+        model = get_model()
+        assert isinstance(model, OpenAIChatModel)
+```
+### Full Test Suite
+```bash
+$ make check
+# 309 passed in 238.16s (0:03:58)
+# All checks passed!
+```
+### Manual Verification
+1. **Unset all API keys**: `unset OPENAI_API_KEY ANTHROPIC_API_KEY HF_TOKEN`
+2. **Run app**: `uv run python -m src.app`
+3. **Submit query**: "What drugs improve female libido?"
+4. **Verify**: Synthesis falls back to template (shows `ConfigurationError` in logs, but user sees structured summary)

src/agent_factory/judges.py CHANGED Viewed

@@ -54,28 +54,41 @@ def _extract_titles_from_evidence(
 def get_model() -> Any:
-    """Get the LLM model based on configuration.
-    Explicitly passes API keys from settings to avoid requiring
-    users to export environment variables manually.
     """
-    llm_provider = settings.llm_provider
-    if llm_provider == "anthropic":
         provider = AnthropicProvider(api_key=settings.anthropic_api_key)
         return AnthropicModel(settings.anthropic_model, provider=provider)
-    if llm_provider == "huggingface":
-        # Free tier - uses HF_TOKEN from environment if available
         model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)
-    if llm_provider != "openai":
-        logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
-    openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
-    return OpenAIChatModel(settings.openai_model, provider=openai_provider)
 class JudgeHandler:

 def get_model() -> Any:
+    """Get the LLM model based on available API keys.
+    Priority order:
+    1. OpenAI (if OPENAI_API_KEY set)
+    2. Anthropic (if ANTHROPIC_API_KEY set)
+    3. HuggingFace (if HF_TOKEN set)
+    Raises:
+        ConfigurationError: If no API keys are configured.
+    Note: settings.llm_provider is ignored in favor of actual key availability.
+    This ensures the model matches what app.py selected for JudgeHandler.
     """
+    from src.utils.exceptions import ConfigurationError
+    # Priority 1: OpenAI (most common, best tool calling)
+    if settings.has_openai_key:
+        openai_provider = OpenAIProvider(api_key=settings.openai_api_key)
+        return OpenAIChatModel(settings.openai_model, provider=openai_provider)
+    # Priority 2: Anthropic
+    if settings.has_anthropic_key:
         provider = AnthropicProvider(api_key=settings.anthropic_api_key)
         return AnthropicModel(settings.anthropic_model, provider=provider)
+    # Priority 3: HuggingFace (requires HF_TOKEN)
+    if settings.has_huggingface_key:
         model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
         hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
         return HuggingFaceModel(model_name, provider=hf_provider)
+    # No keys configured - fail fast with clear error
+    raise ConfigurationError(
+        "No LLM API key configured. Set one of: OPENAI_API_KEY, ANTHROPIC_API_KEY, or HF_TOKEN"
+    )
 class JudgeHandler:

src/app.py CHANGED Viewed

@@ -112,11 +112,11 @@ def configure_orchestrator(
         judge_handler = JudgeHandler(model=model, domain=domain)
     # 3. Environment API Keys (fallback)
-    elif os.getenv("OPENAI_API_KEY"):
         judge_handler = JudgeHandler(model=None, domain=domain)  # Uses env key
         backend_info = "Paid API (OpenAI from env)"
-    elif os.getenv("ANTHROPIC_API_KEY"):
         judge_handler = JudgeHandler(model=None, domain=domain)  # Uses env key
         backend_info = "Paid API (Anthropic from env)"
@@ -177,8 +177,8 @@ async def research_agent(
     user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
     # Check available keys
-    has_openai = bool(os.getenv("OPENAI_API_KEY"))
-    has_anthropic = bool(os.getenv("ANTHROPIC_API_KEY"))
     # Check for OpenAI user key
     is_openai_user_key = (
         user_api_key and user_api_key.startswith("sk-") and not user_api_key.startswith("sk-ant-")

         judge_handler = JudgeHandler(model=model, domain=domain)
     # 3. Environment API Keys (fallback)
+    elif settings.has_openai_key:
         judge_handler = JudgeHandler(model=None, domain=domain)  # Uses env key
         backend_info = "Paid API (OpenAI from env)"
+    elif settings.has_anthropic_key:
         judge_handler = JudgeHandler(model=None, domain=domain)  # Uses env key
         backend_info = "Paid API (Anthropic from env)"
     user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
     # Check available keys
+    has_openai = settings.has_openai_key
+    has_anthropic = settings.has_anthropic_key
     # Check for OpenAI user key
     is_openai_user_key = (
         user_api_key and user_api_key.startswith("sk-") and not user_api_key.startswith("sk-ant-")

src/config/domain.py CHANGED Viewed

@@ -122,7 +122,8 @@ def get_domain_config(domain: ResearchDomain | str | None = None) -> DomainConfi
     if isinstance(domain, str):
         try:
             domain = ResearchDomain(domain)
-        except ValueError:
-            domain = DEFAULT_DOMAIN
     return DOMAIN_CONFIGS[domain]

     if isinstance(domain, str):
         try:
             domain = ResearchDomain(domain)
+        except ValueError as e:
+            valid_domains = [d.value for d in ResearchDomain]
+            raise ValueError(f"Invalid domain '{domain}'. Valid domains: {valid_domains}") from e
     return DOMAIN_CONFIGS[domain]

src/orchestrators/advanced.py CHANGED Viewed

@@ -15,7 +15,6 @@ Design Patterns:
 """
 import asyncio
-import os
 from collections.abc import AsyncGenerator
 from typing import TYPE_CHECKING, Any
@@ -85,27 +84,11 @@ class AdvancedOrchestrator(OrchestratorProtocol):
         if not chat_client and not api_key:
             check_magentic_requirements()
-        # Environment-configurable rounds (default 5 for demos)
-        raw_rounds = os.getenv("ADVANCED_MAX_ROUNDS", "5")
-        try:
-            env_rounds = int(raw_rounds)
-        except ValueError:
-            logger.warning(
-                "Invalid ADVANCED_MAX_ROUNDS value %r, falling back to 5",
-                raw_rounds,
-            )
-            env_rounds = 5
-        if env_rounds < 1:
-            logger.warning(
-                "ADVANCED_MAX_ROUNDS must be >= 1, got %d; using 1 instead",
-                env_rounds,
-            )
-            env_rounds = 1
-        self._max_rounds = max_rounds if max_rounds is not None else env_rounds
-        self._timeout_seconds = timeout_seconds
         self.domain = domain
         self.domain_config = get_domain_config(domain)
         self._chat_client: OpenAIChatClient | None

 """
 import asyncio
 from collections.abc import AsyncGenerator
 from typing import TYPE_CHECKING, Any
         if not chat_client and not api_key:
             check_magentic_requirements()
+        # Use pydantic-validated settings (fails fast on invalid config)
+        self._max_rounds = max_rounds if max_rounds is not None else settings.advanced_max_rounds
+        self._timeout_seconds = (
+            timeout_seconds if timeout_seconds != 300.0 else settings.advanced_timeout
+        )
         self.domain = domain
         self.domain_config = get_domain_config(domain)
         self._chat_client: OpenAIChatClient | None

src/services/statistical_analyzer.py CHANGED Viewed

@@ -12,6 +12,8 @@ import re
 from functools import lru_cache, partial
 from typing import Any, Literal
 # Type alias for verdict values
 VerdictType = Literal["SUPPORTED", "REFUTED", "INCONCLUSIVE"]
@@ -26,6 +28,8 @@ from src.tools.code_execution import (
 )
 from src.utils.models import Evidence
 class AnalysisResult(BaseModel):
     """Result of statistical analysis."""
@@ -244,7 +248,7 @@ Generate executable Python code to analyze this evidence."""
                 else:
                     return 0.60
             except ValueError:
-                pass
         return 0.70  # Default

 from functools import lru_cache, partial
 from typing import Any, Literal
+import structlog
 # Type alias for verdict values
 VerdictType = Literal["SUPPORTED", "REFUTED", "INCONCLUSIVE"]
 )
 from src.utils.models import Evidence
+logger = structlog.get_logger()
 class AnalysisResult(BaseModel):
     """Result of statistical analysis."""
                 else:
                     return 0.60
             except ValueError:
+                logger.debug("Failed to parse p-values", p_values=p_values)
         return 0.70  # Default

src/tools/code_execution.py CHANGED Viewed

@@ -4,12 +4,13 @@ This module provides sandboxed Python code execution using Modal's serverless in
 It's designed for running LLM-generated statistical analysis code safely.
 """
-import os
 from functools import lru_cache
 from typing import Any
 import structlog
 logger = structlog.get_logger(__name__)
 # Shared library versions for Modal sandbox - used by both executor and LLM prompts
@@ -72,8 +73,8 @@ class ModalCodeExecutor:
             Execution will fail at runtime without valid credentials.
         """
         # Check for Modal credentials
-        self.modal_token_id = os.getenv("MODAL_TOKEN_ID")
-        self.modal_token_secret = os.getenv("MODAL_TOKEN_SECRET")
         if not self.modal_token_id or not self.modal_token_secret:
             logger.warning(
@@ -241,13 +242,16 @@ print(json.dumps({{"__RESULT__": result}}))
     def _extract_output(self, text: str, start_marker: str, end_marker: str) -> str:
         """Extract content between markers."""
-        try:
-            start_idx = text.index(start_marker) + len(start_marker)
-            end_idx = text.index(end_marker)
-            return text[start_idx:end_idx].strip()
-        except ValueError:
-            # Markers not found, return original text
             return text.strip()
 @lru_cache(maxsize=1)

 It's designed for running LLM-generated statistical analysis code safely.
 """
 from functools import lru_cache
 from typing import Any
 import structlog
+from src.utils.config import settings
 logger = structlog.get_logger(__name__)
 # Shared library versions for Modal sandbox - used by both executor and LLM prompts
             Execution will fail at runtime without valid credentials.
         """
         # Check for Modal credentials
+        self.modal_token_id = settings.modal_token_id
+        self.modal_token_secret = settings.modal_token_secret
         if not self.modal_token_id or not self.modal_token_secret:
             logger.warning(
     def _extract_output(self, text: str, start_marker: str, end_marker: str) -> str:
         """Extract content between markers."""
+        start_idx = text.find(start_marker)
+        if start_idx == -1:
             return text.strip()
+        start_idx += len(start_marker)
+        end_idx = text.find(end_marker, start_idx)
+        if end_idx == -1:
+            return text.strip()
+        return text[start_idx:end_idx].strip()
 @lru_cache(maxsize=1)

src/tools/pubmed.py CHANGED Viewed

@@ -3,6 +3,7 @@
 from typing import Any
 import httpx
 import xmltodict
 from tenacity import retry, stop_after_attempt, wait_exponential
@@ -12,6 +13,8 @@ from src.utils.config import settings
 from src.utils.exceptions import RateLimitError, SearchError
 from src.utils.models import Citation, Evidence
 class PubMedTool:
     """Search tool for PubMed/NCBI."""
@@ -126,8 +129,9 @@ class PubMedTool:
                 evidence = self._article_to_evidence(article)
                 if evidence:
                     evidence_list.append(evidence)
-            except Exception:
-                continue  # Skip malformed articles
         return evidence_list

 from typing import Any
 import httpx
+import structlog
 import xmltodict
 from tenacity import retry, stop_after_attempt, wait_exponential
 from src.utils.exceptions import RateLimitError, SearchError
 from src.utils.models import Citation, Evidence
+logger = structlog.get_logger()
 class PubMedTool:
     """Search tool for PubMed/NCBI."""
                 evidence = self._article_to_evidence(article)
                 if evidence:
                     evidence_list.append(evidence)
+            except (KeyError, AttributeError, TypeError) as e:
+                logger.debug("Skipping malformed article", error=str(e))
+                continue
         return evidence_list

src/utils/config.py CHANGED Viewed

@@ -60,10 +60,22 @@ class Settings(BaseSettings):
     # Agent Configuration
     max_iterations: int = Field(default=10, ge=1, le=50)
     search_timeout: int = Field(default=30, description="Seconds to wait for search")
     magentic_timeout: int = Field(
         default=600,
-        description="Timeout for Magentic mode in seconds",
     )
     # Logging

     # Agent Configuration
     max_iterations: int = Field(default=10, ge=1, le=50)
+    advanced_max_rounds: int = Field(
+        default=5,
+        ge=1,
+        le=20,
+        description="Max coordination rounds for Advanced mode (default 5 for faster demos)",
+    )
+    advanced_timeout: float = Field(
+        default=300.0,
+        ge=60.0,
+        le=900.0,
+        description="Timeout for Advanced mode in seconds (default 5 min)",
+    )
     search_timeout: int = Field(default=30, description="Seconds to wait for search")
     magentic_timeout: int = Field(
         default=600,
+        description="Timeout for Magentic mode in seconds (deprecated, use advanced_timeout)",
     )
     # Logging

tests/unit/agent_factory/test_get_model_auto_detect.py ADDED Viewed

	@@ -0,0 +1,59 @@

+import pytest
+from pydantic_ai.models.anthropic import AnthropicModel
+from pydantic_ai.models.huggingface import HuggingFaceModel
+from pydantic_ai.models.openai import OpenAIChatModel
+from src.agent_factory.judges import get_model
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+class TestGetModelAutoDetect:
+    """Test that get_model() auto-detects available providers."""
+    def test_returns_openai_when_key_present(self, monkeypatch):
+        """OpenAI key present → OpenAI model."""
+        # Mock the settings properties (settings is a singleton)
+        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
+        monkeypatch.setattr(settings, "anthropic_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", None)
+        model = get_model()
+        assert isinstance(model, OpenAIChatModel)
+    def test_returns_anthropic_when_only_anthropic_key(self, monkeypatch):
+        """Only Anthropic key → Anthropic model."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
+        monkeypatch.setattr(settings, "hf_token", None)
+        model = get_model()
+        assert isinstance(model, AnthropicModel)
+    def test_returns_huggingface_when_hf_token_present(self, monkeypatch):
+        """HF_TOKEN present (no paid keys) → HuggingFace model."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "anthropic_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", "hf_test_token")
+        model = get_model()
+        assert isinstance(model, HuggingFaceModel)
+    def test_raises_error_when_no_keys(self, monkeypatch):
+        """No keys at all → ConfigurationError."""
+        monkeypatch.setattr(settings, "openai_api_key", None)
+        monkeypatch.setattr(settings, "anthropic_api_key", None)
+        monkeypatch.setattr(settings, "hf_token", None)
+        with pytest.raises(ConfigurationError) as exc_info:
+            get_model()
+        assert "No LLM API key configured" in str(exc_info.value)
+    def test_openai_takes_priority_over_anthropic(self, monkeypatch):
+        """Both keys present → OpenAI wins."""
+        monkeypatch.setattr(settings, "openai_api_key", "sk-test")
+        monkeypatch.setattr(settings, "anthropic_api_key", "sk-ant-test")
+        model = get_model()
+        assert isinstance(model, OpenAIChatModel)

tests/unit/agent_factory/test_judges_factory.py CHANGED Viewed

@@ -24,6 +24,7 @@ def mock_settings():
 def test_get_model_openai(mock_settings):
     """Test that OpenAI model is returned when provider is openai."""
     mock_settings.llm_provider = "openai"
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"
@@ -35,6 +36,8 @@ def test_get_model_openai(mock_settings):
 def test_get_model_anthropic(mock_settings):
     """Test that Anthropic model is returned when provider is anthropic."""
     mock_settings.llm_provider = "anthropic"
     mock_settings.anthropic_api_key = "sk-ant-test"
     mock_settings.anthropic_model = "claude-sonnet-4-5-20250929"
@@ -46,6 +49,9 @@ def test_get_model_anthropic(mock_settings):
 def test_get_model_huggingface(mock_settings):
     """Test that HuggingFace model is returned when provider is huggingface."""
     mock_settings.llm_provider = "huggingface"
     mock_settings.hf_token = "hf_test_token"
     mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
@@ -57,6 +63,7 @@ def test_get_model_huggingface(mock_settings):
 def test_get_model_default_fallback(mock_settings):
     """Test fallback to OpenAI if provider is unknown."""
     mock_settings.llm_provider = "unknown_provider"
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"

 def test_get_model_openai(mock_settings):
     """Test that OpenAI model is returned when provider is openai."""
     mock_settings.llm_provider = "openai"
+    mock_settings.has_openai_key = True
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"
 def test_get_model_anthropic(mock_settings):
     """Test that Anthropic model is returned when provider is anthropic."""
     mock_settings.llm_provider = "anthropic"
+    mock_settings.has_openai_key = False
+    mock_settings.has_anthropic_key = True
     mock_settings.anthropic_api_key = "sk-ant-test"
     mock_settings.anthropic_model = "claude-sonnet-4-5-20250929"
 def test_get_model_huggingface(mock_settings):
     """Test that HuggingFace model is returned when provider is huggingface."""
     mock_settings.llm_provider = "huggingface"
+    mock_settings.has_openai_key = False
+    mock_settings.has_anthropic_key = False
+    mock_settings.has_huggingface_key = True  # CodeRabbit: explicitly set for auto-detect
     mock_settings.hf_token = "hf_test_token"
     mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
 def test_get_model_default_fallback(mock_settings):
     """Test fallback to OpenAI if provider is unknown."""
     mock_settings.llm_provider = "unknown_provider"
+    mock_settings.has_openai_key = True
     mock_settings.openai_api_key = "sk-test"
     mock_settings.openai_model = "gpt-5"

tests/unit/config/test_domain.py CHANGED Viewed

@@ -28,10 +28,14 @@ class TestGetDomainConfig:
         config = get_domain_config("sexual_health")
         assert "Sexual Health" in config.name
-    def test_invalid_string_returns_default(self):
-        # Invalid domains fall back to default (sexual_health)
-        config = get_domain_config("invalid_domain")
-        assert config.name == "Sexual Health Research"
     def test_config_has_required_fields(self):
         required_fields = [

         config = get_domain_config("sexual_health")
         assert "Sexual Health" in config.name
+    def test_invalid_string_raises_value_error(self):
+        # Invalid domains should fail fast with clear error
+        import pytest
+        with pytest.raises(ValueError) as exc_info:
+            get_domain_config("invalid_domain")
+        assert "Invalid domain" in str(exc_info.value)
+        assert "sexual_health" in str(exc_info.value)  # Shows valid options
     def test_config_has_required_fields(self):
         required_fields = [

tests/unit/orchestrators/test_advanced_orchestrator.py CHANGED Viewed

@@ -1,9 +1,12 @@
-import os
 from unittest.mock import patch
 import pytest
 from src.orchestrators.advanced import AdvancedOrchestrator
 @pytest.mark.unit
@@ -11,63 +14,71 @@ class TestAdvancedOrchestratorConfig:
     """Tests for configuration options."""
     def test_default_max_rounds_is_five(self) -> None:
-        """Default max_rounds should be 5 for faster demos."""
-        with (
-            patch.dict(os.environ, {}, clear=True),
-            patch("src.orchestrators.advanced.check_magentic_requirements"),
-        ):
-            # Clear any existing env var
-            os.environ.pop("ADVANCED_MAX_ROUNDS", None)
             orch = AdvancedOrchestrator()
             assert orch._max_rounds == 5
-    def test_max_rounds_from_env(self) -> None:
-        """max_rounds should be configurable via environment."""
-        with (
-            patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}),
-            patch("src.orchestrators.advanced.check_magentic_requirements"),
-        ):
-            orch = AdvancedOrchestrator()
-            assert orch._max_rounds == 3
-    def test_explicit_max_rounds_overrides_env(self) -> None:
-        """Explicit parameter should override environment."""
-        with (
-            patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}),
-            patch("src.orchestrators.advanced.check_magentic_requirements"),
-        ):
             orch = AdvancedOrchestrator(max_rounds=7)
             assert orch._max_rounds == 7
     def test_timeout_default_is_five_minutes(self) -> None:
-        """Default timeout should be 300s (5 min) for faster failure."""
         with patch("src.orchestrators.advanced.check_magentic_requirements"):
             orch = AdvancedOrchestrator()
             assert orch._timeout_seconds == 300.0
-    def test_invalid_env_rounds_falls_back_to_default(self) -> None:
-        """Invalid ADVANCED_MAX_ROUNDS should fall back to 5."""
-        with (
-            patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "not_a_number"}),
-            patch("src.orchestrators.advanced.check_magentic_requirements"),
-        ):
-            orch = AdvancedOrchestrator()
-            assert orch._max_rounds == 5
-    def test_zero_env_rounds_clamps_to_one(self) -> None:
-        """ADVANCED_MAX_ROUNDS=0 should clamp to 1."""
-        with (
-            patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "0"}),
-            patch("src.orchestrators.advanced.check_magentic_requirements"),
-        ):
-            orch = AdvancedOrchestrator()
-            assert orch._max_rounds == 1
-    def test_negative_env_rounds_clamps_to_one(self) -> None:
-        """Negative ADVANCED_MAX_ROUNDS should clamp to 1."""
-        with (
-            patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "-5"}),
-            patch("src.orchestrators.advanced.check_magentic_requirements"),
-        ):
-            orch = AdvancedOrchestrator()
-            assert orch._max_rounds == 1

+"""Tests for AdvancedOrchestrator configuration."""
 from unittest.mock import patch
 import pytest
+from pydantic import ValidationError
 from src.orchestrators.advanced import AdvancedOrchestrator
+from src.utils.config import Settings
 @pytest.mark.unit
     """Tests for configuration options."""
     def test_default_max_rounds_is_five(self) -> None:
+        """Default max_rounds should be 5 from settings."""
+        with patch("src.orchestrators.advanced.check_magentic_requirements"):
             orch = AdvancedOrchestrator()
             assert orch._max_rounds == 5
+    def test_explicit_max_rounds_overrides_settings(self) -> None:
+        """Explicit parameter should override settings."""
+        with patch("src.orchestrators.advanced.check_magentic_requirements"):
             orch = AdvancedOrchestrator(max_rounds=7)
             assert orch._max_rounds == 7
     def test_timeout_default_is_five_minutes(self) -> None:
+        """Default timeout should be 300s (5 min) from settings."""
         with patch("src.orchestrators.advanced.check_magentic_requirements"):
             orch = AdvancedOrchestrator()
             assert orch._timeout_seconds == 300.0
+    def test_explicit_timeout_overrides_settings(self) -> None:
+        """Explicit timeout parameter should override settings."""
+        with patch("src.orchestrators.advanced.check_magentic_requirements"):
+            orch = AdvancedOrchestrator(timeout_seconds=120.0)
+            assert orch._timeout_seconds == 120.0
+@pytest.mark.unit
+class TestSettingsValidation:
+    """Tests for pydantic Settings validation (fail-fast behavior)."""
+    def test_invalid_max_rounds_type_raises(self) -> None:
+        """Non-integer ADVANCED_MAX_ROUNDS should fail at startup."""
+        with pytest.raises(ValidationError) as exc_info:
+            Settings(advanced_max_rounds="not_a_number")  # type: ignore[arg-type]
+        assert "advanced_max_rounds" in str(exc_info.value)
+    def test_zero_max_rounds_raises(self) -> None:
+        """ADVANCED_MAX_ROUNDS=0 should fail validation (ge=1)."""
+        with pytest.raises(ValidationError) as exc_info:
+            Settings(advanced_max_rounds=0)
+        assert "greater than or equal to 1" in str(exc_info.value)
+    def test_negative_max_rounds_raises(self) -> None:
+        """Negative ADVANCED_MAX_ROUNDS should fail validation."""
+        with pytest.raises(ValidationError) as exc_info:
+            Settings(advanced_max_rounds=-5)
+        assert "greater than or equal to 1" in str(exc_info.value)
+    def test_max_rounds_above_limit_raises(self) -> None:
+        """ADVANCED_MAX_ROUNDS > 20 should fail validation (le=20)."""
+        with pytest.raises(ValidationError) as exc_info:
+            Settings(advanced_max_rounds=100)
+        assert "less than or equal to 20" in str(exc_info.value)
+    def test_valid_max_rounds_accepted(self) -> None:
+        """Valid ADVANCED_MAX_ROUNDS should be accepted."""
+        s = Settings(advanced_max_rounds=10)
+        assert s.advanced_max_rounds == 10
+    def test_timeout_too_low_raises(self) -> None:
+        """ADVANCED_TIMEOUT < 60 should fail validation."""
+        with pytest.raises(ValidationError) as exc_info:
+            Settings(advanced_timeout=30.0)
+        assert "greater than or equal to 60" in str(exc_info.value)
+    def test_timeout_too_high_raises(self) -> None:
+        """ADVANCED_TIMEOUT > 900 should fail validation."""
+        with pytest.raises(ValidationError) as exc_info:
+            Settings(advanced_timeout=1000.0)
+        assert "less than or equal to 900" in str(exc_info.value)

tests/unit/test_app_domain.py CHANGED Viewed

@@ -25,10 +25,17 @@ class TestAppDomain:
         )
     @patch.dict("os.environ", {}, clear=True)
     @patch("src.app.create_orchestrator")
     @patch("src.app.HFInferenceJudgeHandler")
-    def test_configure_orchestrator_passes_domain_free_tier(self, mock_hf_judge, mock_create):
         """Test domain is passed when using free tier (no API keys)."""
         configure_orchestrator(use_mock=False, mode="simple", domain=ResearchDomain.SEXUAL_HEALTH)
         # HFInferenceJudgeHandler should receive domain (no API keys = free tier)
@@ -42,8 +49,13 @@ class TestAppDomain:
             domain=ResearchDomain.SEXUAL_HEALTH,
         )
     @patch("src.app.configure_orchestrator")
-    async def test_research_agent_passes_domain(self, mock_config):
         # Mock orchestrator
         mock_orch = MagicMock()
         mock_orch.run.return_value = []  # Async iterator?

         )
     @patch.dict("os.environ", {}, clear=True)
+    @patch("src.app.settings")
     @patch("src.app.create_orchestrator")
     @patch("src.app.HFInferenceJudgeHandler")
+    def test_configure_orchestrator_passes_domain_free_tier(
+        self, mock_hf_judge, mock_create, mock_settings
+    ):
         """Test domain is passed when using free tier (no API keys)."""
+        # Simulate no keys in settings
+        mock_settings.has_openai_key = False
+        mock_settings.has_anthropic_key = False
         configure_orchestrator(use_mock=False, mode="simple", domain=ResearchDomain.SEXUAL_HEALTH)
         # HFInferenceJudgeHandler should receive domain (no API keys = free tier)
             domain=ResearchDomain.SEXUAL_HEALTH,
         )
+    @patch("src.app.settings")
     @patch("src.app.configure_orchestrator")
+    async def test_research_agent_passes_domain(self, mock_config, mock_settings):
+        # Mock settings to have some state
+        mock_settings.has_openai_key = False
+        mock_settings.has_anthropic_key = False
         # Mock orchestrator
         mock_orch = MagicMock()
         mock_orch.run.return_value = []  # Async iterator?