Transcript Debugging Guide
Issue: Empty Transcripts ("No transcript available")
Complete Flow Analysis
1. Django App → API Request (slaq-version-c/diagnosis/ai_engine/detect_stuttering.py)
Location: Line 269-274
response = requests.post(
self.api_url,
files=files,
data={
"transcript": proper_transcript if proper_transcript else "",
"language": lang_code,
},
timeout=self.api_timeout
)
Status: ✅ Sending transcript parameter correctly
2. API Receives Request (slaq-version-c-ai-enginee/app.py)
Location: Line 70-73
@app.post("/analyze")
async def analyze_audio(
audio: UploadFile = File(...),
transcript: str = Form("") # ✅ Fixed: Now uses Form() for multipart
):
Status: ✅ Fixed - Now correctly receives transcript via Form()
3. API Calls Model (slaq-version-c-ai-enginee/app.py)
Location: Line 106
result = detector.analyze_audio(temp_file, transcript)
Status: ✅ Passing transcript correctly
4. Model Transcribes Audio (slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py)
Location: Line 313-369 (_transcribe_with_timestamps)
Potential Issues:
- ❓ IndicWav2Vec decoding might not work with
processor.batch_decode() - ❓ Need to use tokenizer directly
- ❓ Model might not be producing valid predictions
Status: ⚠️ LIKELY ISSUE HERE - Decoding method may be incorrect
5. Model Returns Result (slaq-version-c-ai-enginee/diagnosis/ai_engine/detect_stuttering.py)
Location: Line 787-794
actual_transcript = transcript if transcript else ""
target_transcript = proper_transcript if proper_transcript else transcript if transcript else ""
return {
'actual_transcript': actual_transcript,
'target_transcript': target_transcript,
...
}
Status: ✅ Returns transcripts correctly (if transcript is not empty)
6. API Returns Response (slaq-version-c-ai-enginee/app.py)
Location: Line 109-113
actual = result.get('actual_transcript', '')
target = result.get('target_transcript', '')
logger.info(f"📝 Result transcripts - Actual: '{actual[:100]}' (len: {len(actual)}), Target: '{target[:100]}' (len: {len(target)})")
return result
Status: ✅ Returns JSON with transcripts
7. Django Receives Response (slaq-version-c/diagnosis/ai_engine/detect_stuttering.py)
Location: Line 279-410
result = response.json()
# ... formatting ...
actual_transcript = str(api_result.get('actual_transcript', '')).strip()
target_transcript = str(api_result.get('target_transcript', '')).strip()
Status: ✅ Extracts transcripts correctly
8. Django Saves to Database (slaq-version-c/diagnosis/tasks.py)
Location: Line 141-142
actual_transcript=actual_transcript,
target_transcript=target_transcript,
Status: ✅ Saves correctly
Root Cause Analysis
Most Likely Issue: Transcription Decoding
The IndicWav2Vec model (ai4bharat/indicwav2vec-hindi) may require:
- Direct tokenizer access instead of
processor.batch_decode() - CTC decoding with proper tokenizer
- Special handling for Indic scripts
Fix Applied
Updated _transcribe_with_timestamps() to:
- Try multiple decoding methods
- Use tokenizer directly if available
- Add comprehensive error logging
- Log predicted IDs for debugging
Debugging Steps
1. Check API Logs
When processing audio, look for:
📝 Transcribed text: '...' (length: X)
📝 Final return - Actual: '...' (len: X), Target: '...' (len: Y)
📝 Result transcripts - Actual: '...' (len: X), Target: '...' (len: Y)
2. Check Django Logs
Look for:
📝 Final transcripts - Actual: X chars, Target: Y chars
📝 Saving transcripts - Actual: X chars, Target: Y chars
3. Check Database
Query the AnalysisResult table:
SELECT actual_transcript, target_transcript, LENGTH(actual_transcript) as actual_len, LENGTH(target_transcript) as target_len
FROM diagnosis_analysisresult
ORDER BY created_at DESC LIMIT 5;
4. Test API Directly
curl -X POST "http://localhost:7860/analyze" \
-F "[email protected]" \
-F "transcript=test transcript" \
-F "language=hin"
Check the response JSON for actual_transcript and target_transcript.
Next Steps
- Rebuild Docker image with latest changes
- Check logs during audio processing
- Verify processor structure - logs will show processor attributes
- Test with Hindi audio - model is optimized for Hindi
- Check if model is loaded correctly - verify HF_TOKEN is working
Expected Log Output (Success)
🚀 Initializing Advanced AI Engine on cpu...
✅ HF_TOKEN found - using authenticated model access
📋 Processor type: <class 'transformers.models.wav2vec2.processing_wav2vec2.Wav2Vec2Processor'>
📋 Processor attributes: ['batch_decode', 'decode', 'feature_extractor', 'tokenizer', ...]
📋 Tokenizer type: <class 'transformers.models.wav2vec2.tokenization_wav2vec2.Wav2Vec2CTCTokenizer'>
📝 Transcribed text: 'नमस्ते मैं हिंदी बोल रहा हूं' (length: 25)
📝 Final return - Actual: 'नमस्ते मैं हिंदी बोल रहा हूं' (len: 25), Target: '...' (len: X)
If Still Empty
- Model may not be loaded correctly - check HF_TOKEN
- Audio format issue - ensure 16kHz mono WAV
- Model not producing predictions - check predicted_ids in logs
- Tokenizer mismatch - IndicWav2Vec may need special tokenizer initialization