msse-ai-engineering / docs /memory_monitoring.md
Seth McKnight
Add memory diagnostics endpoints and logging enhancements (#80)
0a7f9b4

Monitoring Memory Usage in Production on Render

This document provides guidance on monitoring memory usage in production for the RAG application deployed on Render's free tier, which has a 512MB memory limit.

Integrated Memory Monitoring Tools

The application includes enhanced memory monitoring specifically optimized for Render deployments:

1. Memory Status Endpoint

The application exposes a dedicated endpoint for monitoring memory usage:

GET /memory/render-status

This endpoint returns detailed information about current memory usage, including:

  • Current memory usage in MB
  • Peak memory usage since startup
  • Memory usage trends (5-minute and 1-hour)
  • Current memory status (normal, warning, critical, emergency)
  • Actions taken if memory thresholds were exceeded

Example response:

{
  "status": "success",
  "is_render": true,
  "memory_status": {
    "timestamp": "2023-10-25T14:32:15.123456",
    "memory_mb": 342.5,
    "peak_memory_mb": 398.2,
    "context": "api_request",
    "status": "warning",
    "action_taken": "light_cleanup",
    "memory_limit_mb": 512.0
  },
  "memory_trends": {
    "current_mb": 342.5,
    "peak_mb": 398.2,
    "samples_count": 356,
    "trend_5min_mb": 12.5,
    "trend_1hour_mb": -24.3
  },
  "render_limit_mb": 512
}

2. Detailed Diagnostics

For more detailed memory diagnostics, use:

GET /memory/diagnostics

This provides a deeper look at memory allocation and usage patterns.

3. Force Memory Cleanup

If you notice memory usage approaching critical levels, you can trigger a manual cleanup:

POST /memory/force-clean

Setting Up External Monitoring

Using Uptime Robot or Similar Services

  1. Set up a monitor to check the /health endpoint every 5 minutes
  2. Set up a separate monitor to check the /memory/render-status endpoint every 15 minutes

Automated Alerting

Configure alerts based on memory thresholds:

  1. Warning Alert: When memory usage exceeds 400MB (78% of limit)
  2. Critical Alert: When memory usage exceeds 450MB (88% of limit)

Monitoring Logs in Render Dashboard

  1. Log into your Render dashboard
  2. Navigate to the service logs
  3. Filter for memory-related log messages:
    • [MEMORY CHECKPOINT]
    • [MEMORY MILESTONE]
    • Memory usage
    • WARNING: Memory usage
    • CRITICAL: Memory usage

Memory Usage Patterns to Watch For

Warning Signs

  1. Steadily Increasing Memory: If memory trends show continuous growth
  2. High Peak After Ingestion: Memory spikes above 450MB after document ingestion
  3. Failure to Release Memory: Memory doesn't decrease after operations complete

Preventative Actions

  1. Regular Cleanup: Schedule low-traffic time for calling /memory/force-clean
  2. Batch Processing: For large document sets, ingest in smaller batches
  3. Monitoring Before Bulk Operations: Check memory status before starting resource-intensive operations

Memory Optimization Features

The application includes several memory optimization features:

  1. Automatic Thresholds: Memory is monitored against configured thresholds (400MB, 450MB, 480MB)
  2. Progressive Cleanup: Different levels of cleanup based on severity
  3. Request Circuit Breaker: Will reject new requests if memory is critically high
  4. Memory Metrics Export: Memory metrics are saved to /tmp/render_metrics/ for later analysis

Troubleshooting Memory Issues

If you encounter persistent memory issues:

  1. Review Logs: Check Render logs for memory checkpoints and milestones
  2. Analyze Trends: Use the /memory/render-status endpoint to identify patterns
  3. Check Operations Timing: High memory could correlate with specific operations
  4. Adjust Configuration: Consider adjusting EMBEDDING_BATCH_SIZE or other parameters in config.py

Available Environment Variables

These environment variables can be configured in Render:

  • MEMORY_DEBUG=1: Enable detailed memory diagnostics
  • MEMORY_LOG_INTERVAL=10: Log memory usage every 10 seconds
  • ENABLE_TRACEMALLOC=1: Enable tracemalloc for detailed memory allocation tracking
  • RENDER=1: Enable Render-specific optimizations (automatically set on Render)