Spaces:

sethmcknight
/

msse-ai-engineering

Sleeping

File size: 4,232 Bytes

0a7f9b4

# Monitoring Memory Usage in Production on Render

This document provides guidance on monitoring memory usage in production for the RAG application deployed on Render's free tier, which has a 512MB memory limit.

## Integrated Memory Monitoring Tools

The application includes enhanced memory monitoring specifically optimized for Render deployments:

### 1. Memory Status Endpoint

The application exposes a dedicated endpoint for monitoring memory usage:

```
GET /memory/render-status
```

This endpoint returns detailed information about current memory usage, including:

- Current memory usage in MB
- Peak memory usage since startup
- Memory usage trends (5-minute and 1-hour)
- Current memory status (normal, warning, critical, emergency)
- Actions taken if memory thresholds were exceeded

Example response:

```json
{
  "status": "success",
  "is_render": true,
  "memory_status": {
    "timestamp": "2023-10-25T14:32:15.123456",
    "memory_mb": 342.5,
    "peak_memory_mb": 398.2,
    "context": "api_request",
    "status": "warning",
    "action_taken": "light_cleanup",
    "memory_limit_mb": 512.0
  },
  "memory_trends": {
    "current_mb": 342.5,
    "peak_mb": 398.2,
    "samples_count": 356,
    "trend_5min_mb": 12.5,
    "trend_1hour_mb": -24.3
  },
  "render_limit_mb": 512
}
```

### 2. Detailed Diagnostics

For more detailed memory diagnostics, use:

```
GET /memory/diagnostics
```

This provides a deeper look at memory allocation and usage patterns.

### 3. Force Memory Cleanup

If you notice memory usage approaching critical levels, you can trigger a manual cleanup:

```
POST /memory/force-clean
```

## Setting Up External Monitoring

### Using Uptime Robot or Similar Services

1. Set up a monitor to check the `/health` endpoint every 5 minutes
2. Set up a separate monitor to check the `/memory/render-status` endpoint every 15 minutes

### Automated Alerting

Configure alerts based on memory thresholds:

1. **Warning Alert**: When memory usage exceeds 400MB (78% of limit)
2. **Critical Alert**: When memory usage exceeds 450MB (88% of limit)

### Monitoring Logs in Render Dashboard

1. Log into your Render dashboard
2. Navigate to the service logs
3. Filter for memory-related log messages:
   - `[MEMORY CHECKPOINT]`
   - `[MEMORY MILESTONE]`
   - `Memory usage`
   - `WARNING: Memory usage`
   - `CRITICAL: Memory usage`

## Memory Usage Patterns to Watch For

### Warning Signs

1. **Steadily Increasing Memory**: If memory trends show continuous growth
2. **High Peak After Ingestion**: Memory spikes above 450MB after document ingestion
3. **Failure to Release Memory**: Memory doesn't decrease after operations complete

### Preventative Actions

1. **Regular Cleanup**: Schedule low-traffic time for calling `/memory/force-clean`
2. **Batch Processing**: For large document sets, ingest in smaller batches
3. **Monitoring Before Bulk Operations**: Check memory status before starting resource-intensive operations

## Memory Optimization Features

The application includes several memory optimization features:

1. **Automatic Thresholds**: Memory is monitored against configured thresholds (400MB, 450MB, 480MB)
2. **Progressive Cleanup**: Different levels of cleanup based on severity
3. **Request Circuit Breaker**: Will reject new requests if memory is critically high
4. **Memory Metrics Export**: Memory metrics are saved to `/tmp/render_metrics/` for later analysis

## Troubleshooting Memory Issues

If you encounter persistent memory issues:

1. **Review Logs**: Check Render logs for memory checkpoints and milestones
2. **Analyze Trends**: Use the `/memory/render-status` endpoint to identify patterns
3. **Check Operations Timing**: High memory could correlate with specific operations
4. **Adjust Configuration**: Consider adjusting `EMBEDDING_BATCH_SIZE` or other parameters in `config.py`

## Available Environment Variables

These environment variables can be configured in Render:

- `MEMORY_DEBUG=1`: Enable detailed memory diagnostics
- `MEMORY_LOG_INTERVAL=10`: Log memory usage every 10 seconds
- `ENABLE_TRACEMALLOC=1`: Enable tracemalloc for detailed memory allocation tracking
- `RENDER=1`: Enable Render-specific optimizations (automatically set on Render)