Spaces:
Sleeping
Sleeping
Seth McKnight
commited on
Commit
Β·
4b80514
1
Parent(s):
13846a7
Reduce default gunicorn workers and clean up documentation (#60)
Browse files* Reduce default gunicorn workers to 1 to avoid out-of-memory errors on low-memory hosts
* chore: Remove outdated implementation summaries for guardrails and query expansion
* chore: Add comment to clarify default worker setting in run.sh
- ISSUE_24_IMPLEMENTATION_SUMMARY.md +0 -223
- QUERY_EXPANSION_IMPLEMENTATION_SUMMARY.md +0 -76
- run.sh +2 -2
ISSUE_24_IMPLEMENTATION_SUMMARY.md
DELETED
|
@@ -1,223 +0,0 @@
|
|
| 1 |
-
# Issue #24: Guardrails and Response Quality System - Implementation Summary
|
| 2 |
-
|
| 3 |
-
## π― Overview
|
| 4 |
-
|
| 5 |
-
Successfully implemented a comprehensive guardrails and response quality system for the RAG pipeline as specified in Issue #24. The implementation includes enterprise-grade safety validation, quality assessment, and source attribution capabilities.
|
| 6 |
-
|
| 7 |
-
## ποΈ Architecture
|
| 8 |
-
|
| 9 |
-
### Core Components
|
| 10 |
-
|
| 11 |
-
1. **ResponseValidator** (`src/guardrails/response_validator.py`)
|
| 12 |
-
- Quality scoring across multiple dimensions (relevance, completeness, coherence, source fidelity)
|
| 13 |
-
- Safety validation with pattern-based detection
|
| 14 |
-
- Confidence scoring and recommendation generation
|
| 15 |
-
|
| 16 |
-
2. **SourceAttributor** (`src/guardrails/source_attribution.py`)
|
| 17 |
-
- Automatic citation generation with multiple formats
|
| 18 |
-
- Source ranking and relevance scoring
|
| 19 |
-
- Quote extraction and validation
|
| 20 |
-
- Citation text enhancement
|
| 21 |
-
|
| 22 |
-
3. **ContentFilter** (`src/guardrails/content_filters.py`)
|
| 23 |
-
- PII detection and masking
|
| 24 |
-
- Inappropriate content filtering
|
| 25 |
-
- Bias detection and mitigation
|
| 26 |
-
- Topic validation against allowed categories
|
| 27 |
-
|
| 28 |
-
4. **QualityMetrics** (`src/guardrails/quality_metrics.py`)
|
| 29 |
-
- Multi-dimensional quality assessment
|
| 30 |
-
- Configurable scoring weights and thresholds
|
| 31 |
-
- Detailed recommendations for improvement
|
| 32 |
-
- Professional tone analysis
|
| 33 |
-
|
| 34 |
-
5. **ErrorHandler** (`src/guardrails/error_handlers.py`)
|
| 35 |
-
- Circuit breaker patterns for resilience
|
| 36 |
-
- Graceful degradation strategies
|
| 37 |
-
- Comprehensive fallback mechanisms
|
| 38 |
-
- Error tracking and recovery
|
| 39 |
-
|
| 40 |
-
6. **GuardrailsSystem** (`src/guardrails/guardrails_system.py`)
|
| 41 |
-
- Main orchestrator coordinating all components
|
| 42 |
-
- Comprehensive validation pipeline
|
| 43 |
-
- Approval logic with configurable thresholds
|
| 44 |
-
- Health monitoring and diagnostics
|
| 45 |
-
|
| 46 |
-
### Integration Layer
|
| 47 |
-
|
| 48 |
-
7. **EnhancedRAGPipeline** (`src/rag/enhanced_rag_pipeline.py`)
|
| 49 |
-
- Seamless integration with existing RAG pipeline
|
| 50 |
-
- Backward compatibility maintained
|
| 51 |
-
- Enhanced response type with guardrails metadata
|
| 52 |
-
- Standalone validation capabilities
|
| 53 |
-
|
| 54 |
-
## π Features Implemented
|
| 55 |
-
|
| 56 |
-
### β
Safety Requirements (All Met)
|
| 57 |
-
- **Content Safety**: Inappropriate content detection and filtering
|
| 58 |
-
- **PII Protection**: Automatic detection and masking of sensitive information
|
| 59 |
-
- **Bias Mitigation**: Pattern-based bias detection and scoring
|
| 60 |
-
- **Topic Validation**: Ensures responses stay within allowed corporate topics
|
| 61 |
-
- **Safety Scoring**: Comprehensive risk assessment
|
| 62 |
-
|
| 63 |
-
### β
Quality Standards (All Met)
|
| 64 |
-
- **Multi-dimensional Quality Assessment**:
|
| 65 |
-
- Relevance scoring (0.3 weight)
|
| 66 |
-
- Completeness scoring (0.25 weight)
|
| 67 |
-
- Coherence scoring (0.2 weight)
|
| 68 |
-
- Source fidelity scoring (0.25 weight)
|
| 69 |
-
- **Configurable Thresholds**: Quality threshold (0.7), minimum response length (50 chars)
|
| 70 |
-
- **Quality Recommendations**: Specific suggestions for improvement
|
| 71 |
-
- **Professional Tone Analysis**: Ensures appropriate business communication
|
| 72 |
-
|
| 73 |
-
### β
Technical Standards (All Met)
|
| 74 |
-
- **Error Handling**: Comprehensive circuit breaker patterns and graceful degradation
|
| 75 |
-
- **Performance**: Efficient validation with configurable timeouts
|
| 76 |
-
- **Logging**: Detailed logging for debugging and monitoring
|
| 77 |
-
- **Configuration**: Flexible configuration system for all components
|
| 78 |
-
- **Testing**: Complete test coverage with 13 passing tests
|
| 79 |
-
- **Documentation**: Comprehensive docstrings and type hints
|
| 80 |
-
|
| 81 |
-
## π§ Configuration
|
| 82 |
-
|
| 83 |
-
The system is highly configurable with default settings optimized for corporate policy applications:
|
| 84 |
-
|
| 85 |
-
```python
|
| 86 |
-
# Example configuration
|
| 87 |
-
guardrails_config = {
|
| 88 |
-
"min_confidence_threshold": 0.7,
|
| 89 |
-
"strict_mode": False,
|
| 90 |
-
"enable_response_enhancement": True,
|
| 91 |
-
"content_filter": {
|
| 92 |
-
"enable_pii_filtering": True,
|
| 93 |
-
"enable_bias_detection": True,
|
| 94 |
-
"safety_threshold": 0.8
|
| 95 |
-
},
|
| 96 |
-
"quality_metrics": {
|
| 97 |
-
"quality_threshold": 0.7,
|
| 98 |
-
"min_response_length": 50,
|
| 99 |
-
"preferred_source_count": 3
|
| 100 |
-
}
|
| 101 |
-
}
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
## π§ͺ Testing
|
| 105 |
-
|
| 106 |
-
### Test Coverage
|
| 107 |
-
- **7 Guardrails Tests**: All core functionality validated
|
| 108 |
-
- **4 Enhanced Pipeline Tests**: Integration testing complete
|
| 109 |
-
- **6 Enhanced App Tests**: API endpoint integration verified
|
| 110 |
-
|
| 111 |
-
### Test Results
|
| 112 |
-
```
|
| 113 |
-
tests/test_guardrails/: 7 tests PASSED
|
| 114 |
-
tests/test_enhanced_app_guardrails.py: 6 tests PASSED
|
| 115 |
-
Total: 13 tests PASSED
|
| 116 |
-
```
|
| 117 |
-
|
| 118 |
-
## π Usage Examples
|
| 119 |
-
|
| 120 |
-
### Basic Integration
|
| 121 |
-
```python
|
| 122 |
-
from src.rag.enhanced_rag_pipeline import EnhancedRAGPipeline
|
| 123 |
-
from src.rag.rag_pipeline import RAGPipeline
|
| 124 |
-
|
| 125 |
-
# Create enhanced pipeline
|
| 126 |
-
base_pipeline = RAGPipeline(search_service, llm_service)
|
| 127 |
-
enhanced_pipeline = EnhancedRAGPipeline(base_pipeline)
|
| 128 |
-
|
| 129 |
-
# Generate validated response
|
| 130 |
-
response = enhanced_pipeline.generate_answer("What is our remote work policy?")
|
| 131 |
-
|
| 132 |
-
# Access guardrails information
|
| 133 |
-
print(f"Approved: {response.guardrails_approved}")
|
| 134 |
-
print(f"Safety: {response.safety_passed}")
|
| 135 |
-
print(f"Quality: {response.quality_score}")
|
| 136 |
-
```
|
| 137 |
-
|
| 138 |
-
### API Integration
|
| 139 |
-
```python
|
| 140 |
-
# Enhanced Flask app with guardrails
|
| 141 |
-
from enhanced_app import app
|
| 142 |
-
|
| 143 |
-
# POST /chat with guardrails enabled
|
| 144 |
-
{
|
| 145 |
-
"message": "What is our remote work policy?",
|
| 146 |
-
"enable_guardrails": true,
|
| 147 |
-
"include_sources": true
|
| 148 |
-
}
|
| 149 |
-
|
| 150 |
-
# Response includes guardrails metadata
|
| 151 |
-
{
|
| 152 |
-
"status": "success",
|
| 153 |
-
"message": "...",
|
| 154 |
-
"guardrails": {
|
| 155 |
-
"approved": true,
|
| 156 |
-
"confidence": 0.85,
|
| 157 |
-
"safety_passed": true,
|
| 158 |
-
"quality_score": 0.8
|
| 159 |
-
}
|
| 160 |
-
}
|
| 161 |
-
```
|
| 162 |
-
|
| 163 |
-
## π Performance Characteristics
|
| 164 |
-
|
| 165 |
-
- **Validation Time**: ~0.001-0.01 seconds per response
|
| 166 |
-
- **Memory Usage**: Minimal overhead, pattern-based processing
|
| 167 |
-
- **Scalability**: Stateless design, horizontally scalable
|
| 168 |
-
- **Reliability**: Circuit breaker patterns prevent cascade failures
|
| 169 |
-
|
| 170 |
-
## π Future Enhancements
|
| 171 |
-
|
| 172 |
-
While all Issue #24 requirements are met, potential future improvements include:
|
| 173 |
-
|
| 174 |
-
1. **Machine Learning Integration**: Replace pattern-based detection with ML models
|
| 175 |
-
2. **Advanced Metrics**: Custom quality metrics for specific domains
|
| 176 |
-
3. **Real-time Monitoring**: Integration with monitoring systems
|
| 177 |
-
4. **A/B Testing**: Framework for testing different validation strategies
|
| 178 |
-
|
| 179 |
-
## π File Structure
|
| 180 |
-
|
| 181 |
-
```
|
| 182 |
-
src/
|
| 183 |
-
βββ guardrails/
|
| 184 |
-
β βββ __init__.py # Package exports
|
| 185 |
-
β βββ guardrails_system.py # Main orchestrator
|
| 186 |
-
β βββ response_validator.py # Quality and safety validation
|
| 187 |
-
β βββ source_attribution.py # Citation generation
|
| 188 |
-
β βββ content_filters.py # Safety filtering
|
| 189 |
-
β βββ quality_metrics.py # Quality assessment
|
| 190 |
-
β βββ error_handlers.py # Error handling
|
| 191 |
-
βββ rag/
|
| 192 |
-
β βββ enhanced_rag_pipeline.py # Integration layer
|
| 193 |
-
tests/
|
| 194 |
-
βββ test_guardrails/
|
| 195 |
-
β βββ test_guardrails_system.py # Core system tests
|
| 196 |
-
β βββ test_enhanced_rag_pipeline.py # Integration tests
|
| 197 |
-
βββ test_enhanced_app_guardrails.py # API tests
|
| 198 |
-
enhanced_app.py # Demo Flask app
|
| 199 |
-
```
|
| 200 |
-
|
| 201 |
-
## β
Acceptance Criteria Validation
|
| 202 |
-
|
| 203 |
-
| Requirement | Status | Implementation |
|
| 204 |
-
|-------------|--------|----------------|
|
| 205 |
-
| Content safety filtering | β
COMPLETE | ContentFilter with PII, bias, inappropriate content detection |
|
| 206 |
-
| Response quality scoring | β
COMPLETE | QualityMetrics with multi-dimensional assessment |
|
| 207 |
-
| Source attribution | β
COMPLETE | SourceAttributor with citation generation and validation |
|
| 208 |
-
| Error handling | β
COMPLETE | ErrorHandler with circuit breakers and graceful degradation |
|
| 209 |
-
| Configuration | β
COMPLETE | Flexible configuration system for all components |
|
| 210 |
-
| Testing | β
COMPLETE | 13 comprehensive tests with 100% pass rate |
|
| 211 |
-
| Documentation | β
COMPLETE | Full docstrings and implementation summary |
|
| 212 |
-
|
| 213 |
-
## π Conclusion
|
| 214 |
-
|
| 215 |
-
Issue #24 has been successfully completed with a production-ready guardrails system that exceeds the specified requirements. The implementation provides:
|
| 216 |
-
|
| 217 |
-
- **Enterprise-grade safety**: Comprehensive content filtering and validation
|
| 218 |
-
- **Quality assurance**: Multi-dimensional quality assessment with recommendations
|
| 219 |
-
- **Seamless integration**: Backward-compatible enhancement of existing RAG pipeline
|
| 220 |
-
- **Production readiness**: Robust error handling, monitoring, and configuration
|
| 221 |
-
- **Extensibility**: Modular design enabling future enhancements
|
| 222 |
-
|
| 223 |
-
The guardrails system is now ready for production deployment and will significantly enhance the safety, quality, and reliability of RAG responses in the corporate policy application.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
QUERY_EXPANSION_IMPLEMENTATION_SUMMARY.md
DELETED
|
@@ -1,76 +0,0 @@
|
|
| 1 |
-
# Query Expansion Implementation Summary
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
Successfully implemented natural language query expansion to bridge the gap between employee terminology and HR document language, dramatically improving semantic search quality for intuitive queries.
|
| 5 |
-
|
| 6 |
-
## Problem Solved
|
| 7 |
-
**Before**: Employee queries using natural language failed to retrieve relevant content
|
| 8 |
-
- β "How much personal time do I earn each year?" β 0 context, no answer
|
| 9 |
-
- β "What's my vacation allowance?" β Failed to match document terminology
|
| 10 |
-
|
| 11 |
-
**After**: Natural language queries successfully retrieve relevant policy information
|
| 12 |
-
- β
"How much personal time do I earn each year?" β 2960 characters context, proper PTO policy answer
|
| 13 |
-
- β
"What health insurance options do I have?" β 3055 characters context, benefits guide content
|
| 14 |
-
|
| 15 |
-
## Technical Implementation
|
| 16 |
-
|
| 17 |
-
### Core Components
|
| 18 |
-
|
| 19 |
-
1. **QueryExpander Class** (`src/search/query_expander.py`)
|
| 20 |
-
- Comprehensive HR terminology synonym mappings
|
| 21 |
-
- Pattern-based query enhancement
|
| 22 |
-
- Domain-specific term expansion
|
| 23 |
-
|
| 24 |
-
2. **SearchService Integration** (`src/search/search_service.py`)
|
| 25 |
-
- Optional query expansion with `enable_query_expansion` parameter
|
| 26 |
-
- Expansion occurs before embedding generation
|
| 27 |
-
- Maintains original query intent while adding synonyms
|
| 28 |
-
|
| 29 |
-
3. **Synonym Database**
|
| 30 |
-
- 100+ mapped relationships across HR domains
|
| 31 |
-
- Time off, benefits, remote work, career development, safety, expenses
|
| 32 |
-
- Bidirectional mapping for comprehensive coverage
|
| 33 |
-
|
| 34 |
-
### Key Synonym Mappings
|
| 35 |
-
- **Time Off**: "personal time" β "PTO", "paid time off", "vacation", "accrual", "leave"
|
| 36 |
-
- **Benefits**: "health insurance" β "healthcare", "medical", "coverage", "benefits"
|
| 37 |
-
- **Remote Work**: "work from home" β "remote work", "telecommuting", "WFH", "telework"
|
| 38 |
-
- **Career**: "promotion" β "advancement", "career growth", "progression"
|
| 39 |
-
- **Safety**: "harassment" β "discrimination", "complaint", "workplace issues"
|
| 40 |
-
|
| 41 |
-
## Results & Impact
|
| 42 |
-
|
| 43 |
-
### Performance Metrics
|
| 44 |
-
- **Query Success Rate**: Significant improvement for natural language queries
|
| 45 |
-
- **Response Quality**: Maintained high precision while improving recall
|
| 46 |
-
- **Latency Impact**: Minimal (~10ms additional processing)
|
| 47 |
-
- **Memory Footprint**: Lightweight implementation (< 1MB)
|
| 48 |
-
|
| 49 |
-
### User Experience Enhancement
|
| 50 |
-
- **Natural Language Support**: Employees can ask questions using intuitive terminology
|
| 51 |
-
- **Reduced Friction**: No need to learn specific HR terminology
|
| 52 |
-
- **Broader Coverage**: Handles various ways of expressing the same concepts
|
| 53 |
-
- **Consistent Results**: Reliable retrieval across synonym variations
|
| 54 |
-
|
| 55 |
-
## Validation Testing
|
| 56 |
-
Comprehensive testing demonstrated improvement across key categories:
|
| 57 |
-
- β
Time Off & Leave policies
|
| 58 |
-
- β
Benefits & healthcare information
|
| 59 |
-
- β
Remote work guidelines
|
| 60 |
-
- β
Career development policies
|
| 61 |
-
- β
Safety & compliance procedures
|
| 62 |
-
- β
Expense & travel policies
|
| 63 |
-
|
| 64 |
-
## Future Enhancements
|
| 65 |
-
- Monitor real-world query patterns for additional synonym opportunities
|
| 66 |
-
- Context-aware expansion based on document types
|
| 67 |
-
- Integration with external HR terminology databases
|
| 68 |
-
- Machine learning-based synonym discovery
|
| 69 |
-
|
| 70 |
-
## Files Modified
|
| 71 |
-
- **NEW**: `src/search/query_expander.py` - Core expansion logic
|
| 72 |
-
- **UPDATED**: `src/search/search_service.py` - Integration layer
|
| 73 |
-
- **UPDATED**: `.gitignore` - Test directory exclusion
|
| 74 |
-
- **DOCUMENTATION**: README.md, CHANGELOG.md updates
|
| 75 |
-
|
| 76 |
-
This implementation represents a significant enhancement to the RAG system's natural language understanding capabilities, making it more user-friendly and accessible for employee self-service HR queries.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
run.sh
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -e
|
| 3 |
|
| 4 |
-
# Default
|
| 5 |
-
WORKERS_VALUE="${WORKERS:-
|
| 6 |
TIMEOUT_VALUE="${TIMEOUT:-120}"
|
| 7 |
PORT_VALUE="${PORT:-10000}"
|
| 8 |
|
|
|
|
| 1 |
#!/usr/bin/env bash
|
| 2 |
set -e
|
| 3 |
|
| 4 |
+
# Default to 1 worker to prevent OOM on low-memory hosts
|
| 5 |
+
WORKERS_VALUE="${WORKERS:-1}"
|
| 6 |
TIMEOUT_VALUE="${TIMEOUT:-120}"
|
| 7 |
PORT_VALUE="${PORT:-10000}"
|
| 8 |
|