anfastech commited on
Commit
d8013e7
Β·
1 Parent(s): c9f07b9

Adding Readme

Browse files
Files changed (1) hide show
  1. README.md +8 -558
README.md CHANGED
@@ -1,561 +1,11 @@
1
- # πŸš€ SLAQ Version C AI Engine
2
-
3
- **FastAPI-based Stutter Detection API for SLAQ Django Application**
4
-
5
- This is the AI engine microservice that provides stuttering analysis capabilities for the SLAQ Django application. It uses advanced ML models (MMS-1B) to detect and analyze stuttering events in audio recordings, with support for multiple Indian languages.
6
-
7
- ---
8
-
9
- ## πŸ“‹ Table of Contents
10
-
11
- - [Overview](#overview)
12
- - [API Endpoints](#api-endpoints)
13
- - [Request/Response Formats](#requestresponse-formats)
14
- - [Language Support](#language-support)
15
- - [Integration with Django App](#integration-with-django-app)
16
- - [Configuration](#configuration)
17
- - [Error Handling](#error-handling)
18
- - [Health Checks](#health-checks)
19
- - [Deployment](#deployment)
20
- - [Recent Enhancements](#recent-enhancements)
21
-
22
- ---
23
-
24
- ## 🎯 Overview
25
-
26
- The SLAQ AI Engine is a FastAPI service that:
27
-
28
- - **Analyzes audio files** for stuttering patterns using Meta's MMS-1B model
29
- - **Supports 15+ Indian languages** including Hindi, Tamil, Telugu, Bengali, and more
30
- - **Provides detailed analysis** including:
31
- - Transcription accuracy
32
- - Stutter event detection (repetitions, prolongations, blocks)
33
- - Severity classification (none, mild, moderate, severe)
34
- - Confidence scores and timestamps
35
- - **Integrates seamlessly** with the Django SLAQ application via HTTP API
36
-
37
- **Base URL:** `https://anfastech-slaq-version-c-ai-enginee.hf.space`
38
-
39
- ---
40
-
41
- ## πŸ”Œ API Endpoints
42
-
43
- ### 1. Health Check
44
-
45
- **Endpoint:** `GET /health`
46
-
47
- **Description:** Check if the API is healthy and models are loaded.
48
-
49
- **Response:**
50
- ```json
51
- {
52
- "status": "healthy",
53
- "models_loaded": true,
54
- "timestamp": "2024-01-15 10:30:45"
55
- }
56
- ```
57
-
58
- **Status Codes:**
59
- - `200`: Service is healthy
60
- - `503`: Models not loaded yet
61
-
62
- ---
63
-
64
- ### 2. Analyze Audio
65
-
66
- **Endpoint:** `POST /analyze`
67
-
68
- **Description:** Analyze an audio file for stuttering patterns.
69
-
70
- **Request Format:** `multipart/form-data`
71
-
72
- **Parameters:**
73
-
74
- | Parameter | Type | Required | Default | Description |
75
- |-----------|------|----------|---------|-------------|
76
- | `audio` | File | βœ… Yes | - | Audio file (WAV, MP3, OGG, WebM) |
77
- | `transcript` | String | ❌ No | `""` | Optional expected transcript for comparison |
78
- | `language` | String | ❌ No | `"english"` | Language code (see [Language Support](#language-support)) |
79
-
80
- **Example Request (cURL):**
81
- ```bash
82
- curl -X POST "https://anfastech-slaq-version-c-ai-enginee.hf.space/analyze" \
83
84
- -F "transcript=Hello world" \
85
- -F "language=hindi"
86
- ```
87
-
88
- **Example Request (Python):**
89
- ```python
90
- import requests
91
-
92
- files = {"audio": ("recording.wav", open("recording.wav", "rb"), "audio/wav")}
93
- data = {
94
- "transcript": "Hello world",
95
- "language": "hindi"
96
- }
97
-
98
- response = requests.post(
99
- "https://anfastech-slaq-version-c-ai-enginee.hf.space/analyze",
100
- files=files,
101
- data=data
102
- )
103
-
104
- result = response.json()
105
- ```
106
-
107
- **Response Format:**
108
- ```json
109
- {
110
- "actual_transcript": "Hello world",
111
- "target_transcript": "Hello world",
112
- "mismatched_chars": [],
113
- "mismatch_percentage": 0.0,
114
- "ctc_loss_score": 0.15,
115
- "stutter_timestamps": [
116
- {
117
- "type": "repetition",
118
- "start": 1.5,
119
- "end": 2.0,
120
- "duration": 0.5,
121
- "confidence": 0.85,
122
- "text": "he-he"
123
- }
124
- ],
125
- "total_stutter_duration": 0.5,
126
- "stutter_frequency": 2.5,
127
- "severity": "mild",
128
- "confidence_score": 0.92,
129
- "analysis_duration_seconds": 3.45,
130
- "model_version": "external-api-v1",
131
- "language_detected": "hin"
132
- }
133
- ```
134
-
135
- **Response Fields:**
136
-
137
- | Field | Type | Description |
138
- |-------|------|-------------|
139
- | `actual_transcript` | String | Transcribed text from audio |
140
- | `target_transcript` | String | Expected transcript (if provided) |
141
- | `mismatched_chars` | Array | List of character-level mismatches |
142
- | `mismatch_percentage` | Float | Percentage of mismatched characters (0-100) |
143
- | `ctc_loss_score` | Float | CTC loss score from model |
144
- | `stutter_timestamps` | Array | List of detected stutter events |
145
- | `total_stutter_duration` | Float | Total duration of stuttering in seconds |
146
- | `stutter_frequency` | Float | Frequency of stuttering events per minute |
147
- | `severity` | String | Severity classification: `none`, `mild`, `moderate`, `severe` |
148
- | `confidence_score` | Float | Overall confidence in analysis (0-1) |
149
- | `analysis_duration_seconds` | Float | Time taken for analysis |
150
- | `model_version` | String | Version of the model used |
151
- | `language_detected` | String | Detected/used language code |
152
-
153
- **Stutter Event Format:**
154
- ```json
155
- {
156
- "type": "repetition" | "prolongation" | "block" | "dysfluency",
157
- "start": 1.5,
158
- "end": 2.0,
159
- "duration": 0.5,
160
- "confidence": 0.85,
161
- "text": "he-he"
162
- }
163
- ```
164
-
165
- **Status Codes:**
166
- - `200`: Analysis successful
167
- - `400`: Invalid request (missing audio file, invalid format)
168
- - `500`: Analysis failed (internal error)
169
- - `503`: Models not loaded yet
170
-
171
- ---
172
-
173
- ### 3. API Documentation
174
-
175
- **Endpoint:** `GET /`
176
-
177
- **Description:** Get API information and documentation.
178
-
179
- **Response:**
180
- ```json
181
- {
182
- "name": "SLAQ Stutter Detector API",
183
- "version": "1.0.0",
184
- "status": "running",
185
- "endpoints": {
186
- "health": "GET /health",
187
- "analyze": "POST /analyze (multipart form: audio file, transcript (optional), language (optional, default: 'english'))",
188
- "docs": "GET /docs (interactive API docs)"
189
- },
190
- "models": {
191
- "base": "facebook/wav2vec2-base-960h",
192
- "large": "facebook/wav2vec2-large-960h-lv60-self",
193
- "xlsr": "jonatasgrosman/wav2vec2-large-xlsr-53-english"
194
- }
195
- }
196
- ```
197
-
198
- **Interactive Docs:** `GET /docs` (Swagger UI)
199
-
200
- ---
201
-
202
- ## 🌐 Language Support
203
-
204
- The API supports **15+ Indian languages** through the MMS-1B model:
205
-
206
- ### Supported Languages
207
-
208
- | Language | Code | Language | Code |
209
- |----------|------|----------|------|
210
- | Hindi | `hindi` / `hin` | Tamil | `tamil` / `tam` |
211
- | Telugu | `telugu` / `tel` | Bengali | `bengali` / `ben` |
212
- | Marathi | `marathi` / `mar` | Gujarati | `gujarati` / `guj` |
213
- | Kannada | `kannada` / `kan` | Malayalam | `malayalam` / `mal` |
214
- | Punjabi | `punjabi` / `pan` | Urdu | `urdu` / `urd` |
215
- | Assamese | `assamese` / `asm` | Odia | `odia` / `ory` |
216
- | Bhojpuri | `bhojpuri` / `bho` | Maithili | `maithili` / `mai` |
217
- | English | `english` / `eng` | - | - |
218
-
219
- **Usage:**
220
- - You can use either the full language name (`"hindi"`) or the 3-letter code (`"hin"`)
221
- - Default language is `"english"` if not specified
222
- - Language is automatically resolved to the correct MMS language code
223
-
224
- ---
225
-
226
- ## πŸ”— Integration with Django App
227
-
228
- ### Django Configuration
229
-
230
- The Django application (`slaq-version-c`) connects to this AI engine via HTTP API. Configuration is done in `slaq_project/settings.py`:
231
-
232
- ```python
233
- # AI Engine API Configuration
234
- STUTTER_API_URL = env('STUTTER_API_URL', default='https://anfastech-slaq-version-c-ai-enginee.hf.space/analyze')
235
- STUTTER_API_TIMEOUT = env.int('STUTTER_API_TIMEOUT', default=300) # 5 minutes
236
- DEFAULT_LANGUAGE = env('DEFAULT_LANGUAGE', default='hindi')
237
- STUTTER_API_MAX_RETRIES = env.int('STUTTER_API_MAX_RETRIES', default=3)
238
- STUTTER_API_RETRY_DELAY = env.int('STUTTER_API_RETRY_DELAY', default=5) # seconds
239
- ```
240
-
241
- ### Environment Variables
242
-
243
- Add to your Django `.env` file:
244
-
245
- ```env
246
- STUTTER_API_URL=https://anfastech-slaq-version-c-ai-enginee.hf.space/analyze
247
- STUTTER_API_TIMEOUT=300
248
- DEFAULT_LANGUAGE=hindi
249
- STUTTER_API_MAX_RETRIES=3
250
- STUTTER_API_RETRY_DELAY=5
251
- ```
252
-
253
- ### Django Integration Flow
254
-
255
- 1. **User uploads audio** via Django web interface
256
- 2. **Django creates Celery task** (`process_audio_recording`)
257
- 3. **Celery worker calls** `StutterDetector.analyze_audio()`
258
- 4. **StutterDetector sends HTTP POST** to this AI engine API
259
- 5. **AI engine processes audio** using MMS-1B model
260
- 6. **Results returned** to Django and saved to database
261
-
262
- ### Request/Response Compatibility
263
-
264
- βœ… **Verified Compatible:**
265
-
266
- - **Django sends:** `multipart/form-data` with:
267
- - `files={"audio": (filename, file_obj, mime_type)}`
268
- - `data={"transcript": "...", "language": "..."}`
269
-
270
- - **FastAPI receives:**
271
- - `audio: UploadFile = File(...)`
272
- - `transcript: str = Form("")`
273
- - `language: str = Form("english")`
274
-
275
- βœ… **Format is fully compatible and tested.**
276
-
277
- ---
278
-
279
- ## βš™οΈ Configuration
280
-
281
- ### Environment Variables
282
-
283
- | Variable | Default | Description |
284
- |----------|---------|-------------|
285
- | `PORT` | `7860` | Server port (HuggingFace Spaces uses 7860) |
286
- | `PYTHONUNBUFFERED` | `1` | Enable unbuffered Python output |
287
-
288
- ### Model Configuration
289
-
290
- Models are loaded automatically on startup:
291
- - **MMS-1B Model:** `facebook/mms-1b-all` (for transcription)
292
- - **Language ID Model:** `facebook/mms-lid-126` (for language detection)
293
- - **Device:** Auto-detects CUDA if available, otherwise CPU
294
-
295
- ---
296
-
297
- ## πŸ›‘οΈ Error Handling
298
-
299
- ### Error Response Format
300
-
301
- ```json
302
- {
303
- "detail": "Error message describing what went wrong"
304
- }
305
- ```
306
-
307
- ### Common Error Scenarios
308
-
309
- | Status Code | Scenario | Solution |
310
- |------------|----------|----------|
311
- | `400` | Missing audio file | Ensure `audio` parameter is included |
312
- | `400` | Invalid file format | Use supported formats: WAV, MP3, OGG, WebM |
313
- | `500` | Analysis failed | Check logs for detailed error, retry request |
314
- | `503` | Models not loaded | Wait a few seconds and retry (models load on startup) |
315
- | `504` | Request timeout | Increase timeout or use smaller audio file |
316
-
317
- ### Retry Logic (Django Side)
318
-
319
- The Django application implements automatic retry logic:
320
-
321
- - **Max Retries:** 3 attempts (configurable)
322
- - **Retry Delay:** 5 seconds between retries (configurable)
323
- - **Retries on:** Connection errors, timeouts, 503 (Service Unavailable)
324
- - **No retry on:** 4xx errors (except 503), invalid requests
325
-
326
- ---
327
-
328
- ## πŸ₯ Health Checks
329
-
330
- ### Health Check Endpoint
331
-
332
- **Endpoint:** `GET /health`
333
-
334
- **Use Case:** Monitor API availability and model loading status.
335
-
336
- **Response:**
337
- ```json
338
- {
339
- "status": "healthy",
340
- "models_loaded": true,
341
- "timestamp": "2024-01-15 10:30:45"
342
- }
343
- ```
344
-
345
- ### Django Health Check Integration
346
-
347
- The Django app includes a `check_api_health()` method in `StutterDetector`:
348
-
349
- ```python
350
- from diagnosis.ai_engine.detect_stuttering import StutterDetector
351
-
352
- detector = StutterDetector()
353
- health = detector.check_api_health()
354
-
355
- if health['healthy']:
356
- print(f"βœ… API is healthy (response time: {health['response_time']}s)")
357
- else:
358
- print(f"❌ API is unhealthy: {health['message']}")
359
- ```
360
-
361
- **Health Check Response:**
362
- ```python
363
- {
364
- 'healthy': True,
365
- 'status_code': 200,
366
- 'message': 'API is healthy and accessible',
367
- 'response_time': 0.15, # seconds
368
- 'details': {
369
- 'status': 'healthy',
370
- 'models_loaded': True
371
- }
372
- }
373
- ```
374
-
375
- ---
376
-
377
- ## πŸš€ Deployment
378
-
379
- ### HuggingFace Spaces
380
-
381
- This AI engine is deployed on **HuggingFace Spaces**:
382
-
383
- **Space URL:** `https://huggingface.co/spaces/anfastech/slaq-version-c-ai-enginee`
384
-
385
- **Deployment Configuration:**
386
- - **SDK:** Docker
387
- - **Hardware:** GPU (if available)
388
- - **Port:** 7860 (HuggingFace default)
389
-
390
- ### Local Development
391
-
392
- 1. **Install Dependencies:**
393
- ```bash
394
- pip install -r requirements.txt
395
- ```
396
-
397
- 2. **Run Locally:**
398
- ```bash
399
- python app.py
400
- ```
401
-
402
- 3. **Access API:**
403
- - API: `http://localhost:7860`
404
- - Docs: `http://localhost:7860/docs`
405
- - Health: `http://localhost:7860/health`
406
-
407
- ### Docker Deployment
408
-
409
- ```bash
410
- docker build -t slaq-ai-engine .
411
- docker run -p 7860:7860 slaq-ai-engine
412
- ```
413
-
414
- ---
415
-
416
- ## ✨ Recent Enhancements
417
-
418
- ### Version 1.0.0 (Latest)
419
-
420
- #### βœ… 1. Fixed API URL
421
- - **Changed:** API URL updated from `slaq-version-d-ai-test-engine` to `slaq-version-c-ai-enginee`
422
- - **Location:** `slaq-version-c/diagnosis/ai_engine/detect_stuttering.py:25`
423
- - **Impact:** Django app now correctly points to the version C AI engine
424
-
425
- #### βœ… 2. Language Parameter Support
426
- - **Added:** `language` parameter to `/analyze` endpoint
427
- - **Format:** `Form("english")` - accepts language name or code
428
- - **Default:** `"english"` if not provided
429
- - **Impact:** Enables multi-language stutter detection
430
-
431
- #### βœ… 3. Django Settings Configuration
432
- - **Added:** Configurable API settings via environment variables
433
- - `STUTTER_API_URL`
434
- - `STUTTER_API_TIMEOUT`
435
- - `DEFAULT_LANGUAGE`
436
- - `STUTTER_API_MAX_RETRIES`
437
- - `STUTTER_API_RETRY_DELAY`
438
- - **Impact:** Easy configuration without code changes
439
-
440
- #### βœ… 4. Enhanced Error Handling & Retry Logic
441
- - **Added:** Automatic retry mechanism (3 attempts by default)
442
- - **Features:**
443
- - Configurable retry count and delay
444
- - Smart retry on transient errors (timeout, connection errors, 503)
445
- - No retry on permanent errors (4xx except 503)
446
- - Detailed logging for each attempt
447
- - **Impact:** Improved reliability and resilience
448
-
449
- #### βœ… 5. Health Check Functionality
450
- - **Added:** `check_api_health()` method in Django `StutterDetector`
451
- - **Features:**
452
- - Checks API connectivity
453
- - Measures response time
454
- - Returns detailed health status
455
- - **Impact:** Better monitoring and debugging
456
-
457
- #### βœ… 6. Request/Response Format Verification
458
- - **Verified:** Full compatibility between Django and FastAPI
459
- - **Format:** `multipart/form-data` with proper field mapping
460
- - **Impact:** Reliable integration between services
461
-
462
- ---
463
-
464
- ## πŸ“Š Performance
465
-
466
- ### Typical Response Times
467
-
468
- | Audio Duration | Analysis Time | Total Time (with network) |
469
- |---------------|---------------|---------------------------|
470
- | 5 seconds | ~2-3 seconds | ~3-4 seconds |
471
- | 30 seconds | ~5-8 seconds | ~6-10 seconds |
472
- | 2 minutes | ~15-25 seconds | ~20-30 seconds |
473
- | 5 minutes | ~40-60 seconds | ~50-70 seconds |
474
-
475
- *Times may vary based on audio complexity, language, and server load.*
476
-
477
- ### Timeout Configuration
478
-
479
- - **Default Timeout:** 300 seconds (5 minutes)
480
- - **Configurable:** Via `STUTTER_API_TIMEOUT` environment variable
481
- - **Recommendation:** Set timeout to at least 2x expected analysis time
482
-
483
  ---
484
-
485
- ## πŸ” Troubleshooting
486
-
487
- ### Common Issues
488
-
489
- #### 1. Models Not Loading
490
- **Symptom:** `503 Service Unavailable` or `models_loaded: false`
491
-
492
- **Solution:**
493
- - Wait 30-60 seconds after deployment (models load on startup)
494
- - Check logs for model loading errors
495
- - Verify sufficient memory/GPU resources
496
-
497
- #### 2. Request Timeout
498
- **Symptom:** `504 Gateway Timeout` or timeout errors
499
-
500
- **Solution:**
501
- - Increase `STUTTER_API_TIMEOUT` in Django settings
502
- - Use shorter audio files for testing
503
- - Check network connectivity
504
-
505
- #### 3. Language Not Supported
506
- **Symptom:** Incorrect transcription or errors
507
-
508
- **Solution:**
509
- - Verify language code is in supported list
510
- - Use full language name or 3-letter code
511
- - Check language code mapping in Django `detect_stuttering.py`
512
-
513
- #### 4. File Format Issues
514
- **Symptom:** `400 Bad Request` or analysis fails
515
-
516
- **Solution:**
517
- - Use supported formats: WAV, MP3, OGG, WebM
518
- - Ensure file is valid audio (not corrupted)
519
- - Check file size (max recommended: 10MB)
520
-
521
- ---
522
-
523
- ## πŸ“ API Changelog
524
-
525
- ### 2024-01-15 - Version 1.0.0
526
- - βœ… Added language parameter support
527
- - βœ… Enhanced error handling
528
- - βœ… Added health check endpoint
529
- - βœ… Improved logging and monitoring
530
- - βœ… Fixed API URL to point to version C engine
531
-
532
- ---
533
-
534
- ## πŸ“š Additional Resources
535
-
536
- - **Django Integration:** See `slaq-version-c/diagnosis/ai_engine/detect_stuttering.py`
537
- - **API Documentation:** Visit `/docs` endpoint for interactive Swagger UI
538
- - **HuggingFace Spaces:** https://huggingface.co/docs/hub/spaces
539
- - **FastAPI Docs:** https://fastapi.tiangolo.com/
540
-
541
- ---
542
-
543
- ## πŸ“„ License
544
-
545
- This project is part of the SLAQ (Speech Language Assessment & Quantification) system.
546
-
547
- ---
548
-
549
- ## 🀝 Support
550
-
551
- For issues or questions:
552
- 1. Check the troubleshooting section above
553
- 2. Review API logs for detailed error messages
554
- 3. Verify Django configuration matches this documentation
555
- 4. Check health endpoint: `GET /health`
556
-
557
  ---
558
 
559
- **Last Updated:** 2024-01-15
560
- **API Version:** 1.0.0
561
- **Status:** βœ… Production Ready
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Slaq Version C Ai Enginee
3
+ emoji: 🐠
4
+ colorFrom: yellow
5
+ colorTo: red
6
+ sdk: docker
7
+ pinned: false
8
+ short_description: slaq version c ai enginee deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference