msse-ai-engineering / docs /memory_monitoring.md
Seth McKnight
Add memory diagnostics endpoints and logging enhancements (#80)
0a7f9b4
# Monitoring Memory Usage in Production on Render
This document provides guidance on monitoring memory usage in production for the RAG application deployed on Render's free tier, which has a 512MB memory limit.
## Integrated Memory Monitoring Tools
The application includes enhanced memory monitoring specifically optimized for Render deployments:
### 1. Memory Status Endpoint
The application exposes a dedicated endpoint for monitoring memory usage:
```
GET /memory/render-status
```
This endpoint returns detailed information about current memory usage, including:
- Current memory usage in MB
- Peak memory usage since startup
- Memory usage trends (5-minute and 1-hour)
- Current memory status (normal, warning, critical, emergency)
- Actions taken if memory thresholds were exceeded
Example response:
```json
{
"status": "success",
"is_render": true,
"memory_status": {
"timestamp": "2023-10-25T14:32:15.123456",
"memory_mb": 342.5,
"peak_memory_mb": 398.2,
"context": "api_request",
"status": "warning",
"action_taken": "light_cleanup",
"memory_limit_mb": 512.0
},
"memory_trends": {
"current_mb": 342.5,
"peak_mb": 398.2,
"samples_count": 356,
"trend_5min_mb": 12.5,
"trend_1hour_mb": -24.3
},
"render_limit_mb": 512
}
```
### 2. Detailed Diagnostics
For more detailed memory diagnostics, use:
```
GET /memory/diagnostics
```
This provides a deeper look at memory allocation and usage patterns.
### 3. Force Memory Cleanup
If you notice memory usage approaching critical levels, you can trigger a manual cleanup:
```
POST /memory/force-clean
```
## Setting Up External Monitoring
### Using Uptime Robot or Similar Services
1. Set up a monitor to check the `/health` endpoint every 5 minutes
2. Set up a separate monitor to check the `/memory/render-status` endpoint every 15 minutes
### Automated Alerting
Configure alerts based on memory thresholds:
1. **Warning Alert**: When memory usage exceeds 400MB (78% of limit)
2. **Critical Alert**: When memory usage exceeds 450MB (88% of limit)
### Monitoring Logs in Render Dashboard
1. Log into your Render dashboard
2. Navigate to the service logs
3. Filter for memory-related log messages:
- `[MEMORY CHECKPOINT]`
- `[MEMORY MILESTONE]`
- `Memory usage`
- `WARNING: Memory usage`
- `CRITICAL: Memory usage`
## Memory Usage Patterns to Watch For
### Warning Signs
1. **Steadily Increasing Memory**: If memory trends show continuous growth
2. **High Peak After Ingestion**: Memory spikes above 450MB after document ingestion
3. **Failure to Release Memory**: Memory doesn't decrease after operations complete
### Preventative Actions
1. **Regular Cleanup**: Schedule low-traffic time for calling `/memory/force-clean`
2. **Batch Processing**: For large document sets, ingest in smaller batches
3. **Monitoring Before Bulk Operations**: Check memory status before starting resource-intensive operations
## Memory Optimization Features
The application includes several memory optimization features:
1. **Automatic Thresholds**: Memory is monitored against configured thresholds (400MB, 450MB, 480MB)
2. **Progressive Cleanup**: Different levels of cleanup based on severity
3. **Request Circuit Breaker**: Will reject new requests if memory is critically high
4. **Memory Metrics Export**: Memory metrics are saved to `/tmp/render_metrics/` for later analysis
## Troubleshooting Memory Issues
If you encounter persistent memory issues:
1. **Review Logs**: Check Render logs for memory checkpoints and milestones
2. **Analyze Trends**: Use the `/memory/render-status` endpoint to identify patterns
3. **Check Operations Timing**: High memory could correlate with specific operations
4. **Adjust Configuration**: Consider adjusting `EMBEDDING_BATCH_SIZE` or other parameters in `config.py`
## Available Environment Variables
These environment variables can be configured in Render:
- `MEMORY_DEBUG=1`: Enable detailed memory diagnostics
- `MEMORY_LOG_INTERVAL=10`: Log memory usage every 10 seconds
- `ENABLE_TRACEMALLOC=1`: Enable tracemalloc for detailed memory allocation tracking
- `RENDER=1`: Enable Render-specific optimizations (automatically set on Render)