Seth McKnight commited on
Commit
4b80514
Β·
1 Parent(s): 13846a7

Reduce default gunicorn workers and clean up documentation (#60)

Browse files

* Reduce default gunicorn workers to 1 to avoid out-of-memory errors on low-memory hosts

* chore: Remove outdated implementation summaries for guardrails and query expansion

* chore: Add comment to clarify default worker setting in run.sh

ISSUE_24_IMPLEMENTATION_SUMMARY.md DELETED
@@ -1,223 +0,0 @@
1
- # Issue #24: Guardrails and Response Quality System - Implementation Summary
2
-
3
- ## 🎯 Overview
4
-
5
- Successfully implemented a comprehensive guardrails and response quality system for the RAG pipeline as specified in Issue #24. The implementation includes enterprise-grade safety validation, quality assessment, and source attribution capabilities.
6
-
7
- ## πŸ—οΈ Architecture
8
-
9
- ### Core Components
10
-
11
- 1. **ResponseValidator** (`src/guardrails/response_validator.py`)
12
- - Quality scoring across multiple dimensions (relevance, completeness, coherence, source fidelity)
13
- - Safety validation with pattern-based detection
14
- - Confidence scoring and recommendation generation
15
-
16
- 2. **SourceAttributor** (`src/guardrails/source_attribution.py`)
17
- - Automatic citation generation with multiple formats
18
- - Source ranking and relevance scoring
19
- - Quote extraction and validation
20
- - Citation text enhancement
21
-
22
- 3. **ContentFilter** (`src/guardrails/content_filters.py`)
23
- - PII detection and masking
24
- - Inappropriate content filtering
25
- - Bias detection and mitigation
26
- - Topic validation against allowed categories
27
-
28
- 4. **QualityMetrics** (`src/guardrails/quality_metrics.py`)
29
- - Multi-dimensional quality assessment
30
- - Configurable scoring weights and thresholds
31
- - Detailed recommendations for improvement
32
- - Professional tone analysis
33
-
34
- 5. **ErrorHandler** (`src/guardrails/error_handlers.py`)
35
- - Circuit breaker patterns for resilience
36
- - Graceful degradation strategies
37
- - Comprehensive fallback mechanisms
38
- - Error tracking and recovery
39
-
40
- 6. **GuardrailsSystem** (`src/guardrails/guardrails_system.py`)
41
- - Main orchestrator coordinating all components
42
- - Comprehensive validation pipeline
43
- - Approval logic with configurable thresholds
44
- - Health monitoring and diagnostics
45
-
46
- ### Integration Layer
47
-
48
- 7. **EnhancedRAGPipeline** (`src/rag/enhanced_rag_pipeline.py`)
49
- - Seamless integration with existing RAG pipeline
50
- - Backward compatibility maintained
51
- - Enhanced response type with guardrails metadata
52
- - Standalone validation capabilities
53
-
54
- ## πŸ“‹ Features Implemented
55
-
56
- ### βœ… Safety Requirements (All Met)
57
- - **Content Safety**: Inappropriate content detection and filtering
58
- - **PII Protection**: Automatic detection and masking of sensitive information
59
- - **Bias Mitigation**: Pattern-based bias detection and scoring
60
- - **Topic Validation**: Ensures responses stay within allowed corporate topics
61
- - **Safety Scoring**: Comprehensive risk assessment
62
-
63
- ### βœ… Quality Standards (All Met)
64
- - **Multi-dimensional Quality Assessment**:
65
- - Relevance scoring (0.3 weight)
66
- - Completeness scoring (0.25 weight)
67
- - Coherence scoring (0.2 weight)
68
- - Source fidelity scoring (0.25 weight)
69
- - **Configurable Thresholds**: Quality threshold (0.7), minimum response length (50 chars)
70
- - **Quality Recommendations**: Specific suggestions for improvement
71
- - **Professional Tone Analysis**: Ensures appropriate business communication
72
-
73
- ### βœ… Technical Standards (All Met)
74
- - **Error Handling**: Comprehensive circuit breaker patterns and graceful degradation
75
- - **Performance**: Efficient validation with configurable timeouts
76
- - **Logging**: Detailed logging for debugging and monitoring
77
- - **Configuration**: Flexible configuration system for all components
78
- - **Testing**: Complete test coverage with 13 passing tests
79
- - **Documentation**: Comprehensive docstrings and type hints
80
-
81
- ## πŸ”§ Configuration
82
-
83
- The system is highly configurable with default settings optimized for corporate policy applications:
84
-
85
- ```python
86
- # Example configuration
87
- guardrails_config = {
88
- "min_confidence_threshold": 0.7,
89
- "strict_mode": False,
90
- "enable_response_enhancement": True,
91
- "content_filter": {
92
- "enable_pii_filtering": True,
93
- "enable_bias_detection": True,
94
- "safety_threshold": 0.8
95
- },
96
- "quality_metrics": {
97
- "quality_threshold": 0.7,
98
- "min_response_length": 50,
99
- "preferred_source_count": 3
100
- }
101
- }
102
- ```
103
-
104
- ## πŸ§ͺ Testing
105
-
106
- ### Test Coverage
107
- - **7 Guardrails Tests**: All core functionality validated
108
- - **4 Enhanced Pipeline Tests**: Integration testing complete
109
- - **6 Enhanced App Tests**: API endpoint integration verified
110
-
111
- ### Test Results
112
- ```
113
- tests/test_guardrails/: 7 tests PASSED
114
- tests/test_enhanced_app_guardrails.py: 6 tests PASSED
115
- Total: 13 tests PASSED
116
- ```
117
-
118
- ## πŸš€ Usage Examples
119
-
120
- ### Basic Integration
121
- ```python
122
- from src.rag.enhanced_rag_pipeline import EnhancedRAGPipeline
123
- from src.rag.rag_pipeline import RAGPipeline
124
-
125
- # Create enhanced pipeline
126
- base_pipeline = RAGPipeline(search_service, llm_service)
127
- enhanced_pipeline = EnhancedRAGPipeline(base_pipeline)
128
-
129
- # Generate validated response
130
- response = enhanced_pipeline.generate_answer("What is our remote work policy?")
131
-
132
- # Access guardrails information
133
- print(f"Approved: {response.guardrails_approved}")
134
- print(f"Safety: {response.safety_passed}")
135
- print(f"Quality: {response.quality_score}")
136
- ```
137
-
138
- ### API Integration
139
- ```python
140
- # Enhanced Flask app with guardrails
141
- from enhanced_app import app
142
-
143
- # POST /chat with guardrails enabled
144
- {
145
- "message": "What is our remote work policy?",
146
- "enable_guardrails": true,
147
- "include_sources": true
148
- }
149
-
150
- # Response includes guardrails metadata
151
- {
152
- "status": "success",
153
- "message": "...",
154
- "guardrails": {
155
- "approved": true,
156
- "confidence": 0.85,
157
- "safety_passed": true,
158
- "quality_score": 0.8
159
- }
160
- }
161
- ```
162
-
163
- ## πŸ“Š Performance Characteristics
164
-
165
- - **Validation Time**: ~0.001-0.01 seconds per response
166
- - **Memory Usage**: Minimal overhead, pattern-based processing
167
- - **Scalability**: Stateless design, horizontally scalable
168
- - **Reliability**: Circuit breaker patterns prevent cascade failures
169
-
170
- ## πŸ”„ Future Enhancements
171
-
172
- While all Issue #24 requirements are met, potential future improvements include:
173
-
174
- 1. **Machine Learning Integration**: Replace pattern-based detection with ML models
175
- 2. **Advanced Metrics**: Custom quality metrics for specific domains
176
- 3. **Real-time Monitoring**: Integration with monitoring systems
177
- 4. **A/B Testing**: Framework for testing different validation strategies
178
-
179
- ## πŸ“ File Structure
180
-
181
- ```
182
- src/
183
- β”œβ”€β”€ guardrails/
184
- β”‚ β”œβ”€β”€ __init__.py # Package exports
185
- β”‚ β”œβ”€β”€ guardrails_system.py # Main orchestrator
186
- β”‚ β”œβ”€β”€ response_validator.py # Quality and safety validation
187
- β”‚ β”œβ”€β”€ source_attribution.py # Citation generation
188
- β”‚ β”œβ”€β”€ content_filters.py # Safety filtering
189
- β”‚ β”œβ”€β”€ quality_metrics.py # Quality assessment
190
- β”‚ └── error_handlers.py # Error handling
191
- β”œβ”€β”€ rag/
192
- β”‚ └── enhanced_rag_pipeline.py # Integration layer
193
- tests/
194
- β”œβ”€β”€ test_guardrails/
195
- β”‚ β”œβ”€β”€ test_guardrails_system.py # Core system tests
196
- β”‚ └── test_enhanced_rag_pipeline.py # Integration tests
197
- └── test_enhanced_app_guardrails.py # API tests
198
- enhanced_app.py # Demo Flask app
199
- ```
200
-
201
- ## βœ… Acceptance Criteria Validation
202
-
203
- | Requirement | Status | Implementation |
204
- |-------------|--------|----------------|
205
- | Content safety filtering | βœ… COMPLETE | ContentFilter with PII, bias, inappropriate content detection |
206
- | Response quality scoring | βœ… COMPLETE | QualityMetrics with multi-dimensional assessment |
207
- | Source attribution | βœ… COMPLETE | SourceAttributor with citation generation and validation |
208
- | Error handling | βœ… COMPLETE | ErrorHandler with circuit breakers and graceful degradation |
209
- | Configuration | βœ… COMPLETE | Flexible configuration system for all components |
210
- | Testing | βœ… COMPLETE | 13 comprehensive tests with 100% pass rate |
211
- | Documentation | βœ… COMPLETE | Full docstrings and implementation summary |
212
-
213
- ## πŸŽ‰ Conclusion
214
-
215
- Issue #24 has been successfully completed with a production-ready guardrails system that exceeds the specified requirements. The implementation provides:
216
-
217
- - **Enterprise-grade safety**: Comprehensive content filtering and validation
218
- - **Quality assurance**: Multi-dimensional quality assessment with recommendations
219
- - **Seamless integration**: Backward-compatible enhancement of existing RAG pipeline
220
- - **Production readiness**: Robust error handling, monitoring, and configuration
221
- - **Extensibility**: Modular design enabling future enhancements
222
-
223
- The guardrails system is now ready for production deployment and will significantly enhance the safety, quality, and reliability of RAG responses in the corporate policy application.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
QUERY_EXPANSION_IMPLEMENTATION_SUMMARY.md DELETED
@@ -1,76 +0,0 @@
1
- # Query Expansion Implementation Summary
2
-
3
- ## Overview
4
- Successfully implemented natural language query expansion to bridge the gap between employee terminology and HR document language, dramatically improving semantic search quality for intuitive queries.
5
-
6
- ## Problem Solved
7
- **Before**: Employee queries using natural language failed to retrieve relevant content
8
- - ❌ "How much personal time do I earn each year?" β†’ 0 context, no answer
9
- - ❌ "What's my vacation allowance?" β†’ Failed to match document terminology
10
-
11
- **After**: Natural language queries successfully retrieve relevant policy information
12
- - βœ… "How much personal time do I earn each year?" β†’ 2960 characters context, proper PTO policy answer
13
- - βœ… "What health insurance options do I have?" β†’ 3055 characters context, benefits guide content
14
-
15
- ## Technical Implementation
16
-
17
- ### Core Components
18
-
19
- 1. **QueryExpander Class** (`src/search/query_expander.py`)
20
- - Comprehensive HR terminology synonym mappings
21
- - Pattern-based query enhancement
22
- - Domain-specific term expansion
23
-
24
- 2. **SearchService Integration** (`src/search/search_service.py`)
25
- - Optional query expansion with `enable_query_expansion` parameter
26
- - Expansion occurs before embedding generation
27
- - Maintains original query intent while adding synonyms
28
-
29
- 3. **Synonym Database**
30
- - 100+ mapped relationships across HR domains
31
- - Time off, benefits, remote work, career development, safety, expenses
32
- - Bidirectional mapping for comprehensive coverage
33
-
34
- ### Key Synonym Mappings
35
- - **Time Off**: "personal time" ↔ "PTO", "paid time off", "vacation", "accrual", "leave"
36
- - **Benefits**: "health insurance" ↔ "healthcare", "medical", "coverage", "benefits"
37
- - **Remote Work**: "work from home" ↔ "remote work", "telecommuting", "WFH", "telework"
38
- - **Career**: "promotion" ↔ "advancement", "career growth", "progression"
39
- - **Safety**: "harassment" ↔ "discrimination", "complaint", "workplace issues"
40
-
41
- ## Results & Impact
42
-
43
- ### Performance Metrics
44
- - **Query Success Rate**: Significant improvement for natural language queries
45
- - **Response Quality**: Maintained high precision while improving recall
46
- - **Latency Impact**: Minimal (~10ms additional processing)
47
- - **Memory Footprint**: Lightweight implementation (< 1MB)
48
-
49
- ### User Experience Enhancement
50
- - **Natural Language Support**: Employees can ask questions using intuitive terminology
51
- - **Reduced Friction**: No need to learn specific HR terminology
52
- - **Broader Coverage**: Handles various ways of expressing the same concepts
53
- - **Consistent Results**: Reliable retrieval across synonym variations
54
-
55
- ## Validation Testing
56
- Comprehensive testing demonstrated improvement across key categories:
57
- - βœ… Time Off & Leave policies
58
- - βœ… Benefits & healthcare information
59
- - βœ… Remote work guidelines
60
- - βœ… Career development policies
61
- - βœ… Safety & compliance procedures
62
- - βœ… Expense & travel policies
63
-
64
- ## Future Enhancements
65
- - Monitor real-world query patterns for additional synonym opportunities
66
- - Context-aware expansion based on document types
67
- - Integration with external HR terminology databases
68
- - Machine learning-based synonym discovery
69
-
70
- ## Files Modified
71
- - **NEW**: `src/search/query_expander.py` - Core expansion logic
72
- - **UPDATED**: `src/search/search_service.py` - Integration layer
73
- - **UPDATED**: `.gitignore` - Test directory exclusion
74
- - **DOCUMENTATION**: README.md, CHANGELOG.md updates
75
-
76
- This implementation represents a significant enhancement to the RAG system's natural language understanding capabilities, making it more user-friendly and accessible for employee self-service HR queries.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
run.sh CHANGED
@@ -1,8 +1,8 @@
1
  #!/usr/bin/env bash
2
  set -e
3
 
4
- # Default values
5
- WORKERS_VALUE="${WORKERS:-4}"
6
  TIMEOUT_VALUE="${TIMEOUT:-120}"
7
  PORT_VALUE="${PORT:-10000}"
8
 
 
1
  #!/usr/bin/env bash
2
  set -e
3
 
4
+ # Default to 1 worker to prevent OOM on low-memory hosts
5
+ WORKERS_VALUE="${WORKERS:-1}"
6
  TIMEOUT_VALUE="${TIMEOUT:-120}"
7
  PORT_VALUE="${PORT:-10000}"
8