Spaces:

anilyanamandra
/

music-flamingo-demo

Runtime error

App Files Files Community

anilyanamandra commited on 25 days ago

Commit

8c2765a

1 Parent(s): 678e362

Add Mermaid code flow diagrams

Browse files

Files changed (2) hide show

code-flow.md +185 -0
main-flow.md +130 -0

code-flow.md ADDED Viewed

	@@ -0,0 +1,185 @@

+# Music Flamingo Code Flow
+```mermaid
+flowchart TD
+    Start([App Starts]) --> Init[Initialize App]
+    Init --> LoadModel[Load Music Flamingo Model<br/>processor & model from MODEL_ID]
+    LoadModel --> SetupProxy{Check for<br/>SSH Proxy?}
+    SetupProxy -->|Yes| CreateTunnel[Create SSH Tunnel]
+    SetupProxy -->|No| Ready[App Ready]
+    CreateTunnel --> Ready
+    Ready --> UI[Gradio UI Loaded]
+    UI --> UserInput{User Input}
+    UserInput -->|Upload Audio| AudioFile[Audio File Path]
+    UserInput -->|YouTube URL| YouTubeURL[YouTube URL String]
+    UserInput -->|Load Button| LoadYouTube[Load YouTube Audio]
+    LoadYouTube --> DownloadYT[download_youtube_audio]
+    DownloadYT --> CheckCache{URL in<br/>Cache?}
+    CheckCache -->|Yes & Exists| ReturnCached[Return Cached File]
+    CheckCache -->|No| ValidateURL[Validate YouTube URL<br/>with Regex]
+    ValidateURL -->|Invalid| Error1[Return Error Message]
+    ValidateURL -->|Valid| YTDL[yt-dlp Download]
+    YTDL --> ExtractAudio[Extract Audio to MP3]
+    ExtractAudio --> CacheFile[Cache File Path]
+    CacheFile --> ReturnFile[Return File Path]
+    ReturnCached --> AudioFile
+    ReturnFile --> AudioFile
+    AudioFile --> UserPrompt[User Enters Prompt]
+    UserPrompt --> ClickGenerate[Click Generate Button]
+    ClickGenerate --> Infer[infer Function]
+    Infer --> DetermineSource{Audio Source?}
+    DetermineSource -->|File Upload| UseFile[Use audio_path]
+    DetermineSource -->|YouTube| DownloadIfNeeded[Download if not cached]
+    DownloadIfNeeded --> UseFile
+    UseFile --> CreateConversation[Create Conversation Format]
+    CreateConversation --> FormatInput["conversations = [<br/>  [{<br/>    'role': 'user',<br/>    'content': [<br/>      {'type': 'text', 'text': prompt},<br/>      {'type': 'audio', 'path': file}<br/>    ]<br/>  }]<br/>]"]
+    FormatInput --> ApplyTemplate[processor.apply_chat_template]
+    ApplyTemplate --> Tokenize[Tokenize Input]
+    Tokenize --> MoveToDevice[Move to model.device]
+    MoveToDevice --> Generate[model.generate<br/>max_new_tokens=4096]
+    Generate --> Decode[processor.batch_decode]
+    Decode --> FormatOutput[Format Result with Status]
+    FormatOutput --> Display[Display in Gradio UI]
+    Error1 --> Display
+    style Start fill:#90EE90
+    style LoadModel fill:#FFD700
+    style Generate fill:#FF6B6B
+    style Display fill:#4ECDC4
+    style Error1 fill:#FF6B6B
+```
+## Detailed Function Flow
+### 1. Initialization Flow
+```mermaid
+sequenceDiagram
+    participant App
+    participant Model
+    participant Proxy
+    App->>Proxy: Check SSH environment variables
+    alt Proxy Available
+        Proxy->>Proxy: Create SSH tunnel
+        Proxy->>App: PROXY_URL set
+    end
+    App->>Model: Load processor from MODEL_ID
+    App->>Model: Load model with device_map="auto"
+    Model->>App: Model ready
+    App->>App: Launch Gradio UI
+```
+### 2. YouTube Download Flow
+```mermaid
+flowchart LR
+    A[YouTube URL] --> B{Valid URL?}
+    B -->|No| C[Return Error]
+    B -->|Yes| D{Cached?}
+    D -->|Yes| E{File Exists?}
+    E -->|Yes| F[Return Cached]
+    E -->|No| G[Download]
+    D -->|No| G
+    G --> H[yt-dlp Download]
+    H --> I[Extract to MP3]
+    I --> J[Cache File]
+    J --> K[Return Path]
+    style C fill:#FF6B6B
+    style F fill:#90EE90
+    style K fill:#90EE90
+```
+### 3. Model Inference Flow
+```mermaid
+sequenceDiagram
+    participant User
+    participant UI
+    participant Download
+    participant Processor
+    participant Model
+    User->>UI: Upload audio or YouTube URL
+    UI->>Download: Get audio file path
+    Download->>UI: Return file path
+    User->>UI: Enter prompt
+    User->>UI: Click Generate
+    UI->>Processor: Create conversation format
+    Processor->>Processor: apply_chat_template()
+    Processor->>Processor: Tokenize input
+    Processor->>Model: Send batch to device
+    Model->>Model: Generate tokens (max 4096)
+    Model->>Processor: Return token IDs
+    Processor->>Processor: batch_decode()
+    Processor->>UI: Return text result
+    UI->>User: Display response
+```
+## Key Functions
+### download_youtube_audio()
+```mermaid
+flowchart TD
+    Start[download_youtube_audio] --> Validate[Validate URL with Regex]
+    Validate -->|Invalid| ReturnError[Return None, Error]
+    Validate -->|Valid| CheckCache{URL in Cache?}
+    CheckCache -->|Yes| CheckFile{File Exists?}
+    CheckFile -->|Yes| ReturnCached[Return Cached Path]
+    CheckFile -->|No| Download[Download Audio]
+    CheckCache -->|No| Download
+    Download --> YTDL[yt-dlp with Options]
+    YTDL --> Extract[Extract to MP3]
+    Extract --> Cache[Store in Cache]
+    Cache --> ReturnPath[Return Path, Status]
+    style ReturnError fill:#FF6B6B
+    style ReturnCached fill:#90EE90
+    style ReturnPath fill:#90EE90
+```
+### infer()
+```mermaid
+flowchart TD
+    Start[infer Function] --> GetAudio{Get Audio}
+    GetAudio -->|File Upload| UseFile[Use audio_path]
+    GetAudio -->|YouTube| DownloadYT[Download YouTube]
+    DownloadYT -->|Success| UseFile
+    DownloadYT -->|Error| ReturnError[Return Error]
+    UseFile --> CreateConv[Create Conversation]
+    CreateConv --> ApplyTemplate[Apply Chat Template]
+    ApplyTemplate --> Generate[Model Generate]
+    Generate --> Decode[Decode Output]
+    Decode --> Format[Format Result]
+    Format --> Return[Return Text]
+    style ReturnError fill:#FF6B6B
+    style Return fill:#90EE90
+```
+## Data Flow
+```mermaid
+flowchart LR
+    A[User Input] --> B{Input Type}
+    B -->|Audio File| C[File Path]
+    B -->|YouTube URL| D[Download Function]
+    D --> C
+    C --> E[Conversation Format]
+    E --> F[Processor]
+    F --> G[Model]
+    G --> H[Generated Text]
+    H --> I[UI Display]
+    style A fill:#4ECDC4
+    style G fill:#FF6B6B
+    style I fill:#90EE90
+```

main-flow.md ADDED Viewed

	@@ -0,0 +1,130 @@

+# Music Flamingo - Main Code Flow
+## Simplified Main Flow
+```mermaid
+flowchart TD
+    Start([App Launch]) --> Init[Initialize]
+    Init --> LoadModel[Load Model & Processor]
+    LoadModel --> UI[Gradio UI Ready]
+    UI --> UserAction{User Action}
+    UserAction -->|Upload File| FilePath[Audio File Path]
+    UserAction -->|Enter YouTube URL| YTURL[YouTube URL]
+    UserAction -->|Click Load| LoadYT[Load YouTube Audio]
+    LoadYT --> Download[download_youtube_audio]
+    Download -->|Success| FilePath
+    Download -->|Error| ErrorMsg[Show Error]
+    FilePath --> EnterPrompt[User Enters Prompt]
+    EnterPrompt --> ClickGen[Click Generate]
+    ClickGen --> Infer[infer Function Called]
+    Infer --> GetAudio[Get Audio File Path]
+    GetAudio --> CreateConv["Create Conversation:<br/>{'role': 'user',<br/> 'content': [<br/>  {'type': 'text', ...},<br/>  {'type': 'audio', 'path': ...}<br/>]"]
+    CreateConv --> Process["processor.apply_chat_template()<br/>- Tokenize<br/>- Format"]
+    Process --> Generate["model.generate()<br/>max_new_tokens=4096"]
+    Generate --> Decode["processor.batch_decode()<br/>Skip special tokens"]
+    Decode --> Format[Format Output]
+    Format --> Display[Display in UI]
+    ErrorMsg --> Display
+    style Start fill:#90EE90
+    style LoadModel fill:#FFD700
+    style Generate fill:#FF6B6B
+    style Display fill:#4ECDC4
+    style ErrorMsg fill:#FF6B6B
+```
+## Component Interaction
+```mermaid
+graph TB
+    subgraph "User Interface"
+        UI[Gradio Blocks]
+        AudioInput[Audio Component]
+        YTInput[YouTube Textbox]
+        PromptInput[Prompt Textbox]
+        Output[Output Textbox]
+        GenButton[Generate Button]
+    end
+    subgraph "Processing Layer"
+        DownloadFunc[download_youtube_audio]
+        InferFunc[infer Function]
+    end
+    subgraph "Model Layer"
+        Processor[AutoProcessor]
+        Model[AutoModel]
+    end
+    UI --> AudioInput
+    UI --> YTInput
+    UI --> PromptInput
+    UI --> GenButton
+    UI --> Output
+    YTInput --> DownloadFunc
+    DownloadFunc --> AudioInput
+    GenButton --> InferFunc
+    AudioInput --> InferFunc
+    PromptInput --> InferFunc
+    InferFunc --> Processor
+    Processor --> Model
+    Model --> Processor
+    Processor --> InferFunc
+    InferFunc --> Output
+    style UI fill:#4ECDC4
+    style Model fill:#FF6B6B
+    style Output fill:#90EE90
+```
+## Function Call Sequence
+```mermaid
+sequenceDiagram
+    autonumber
+    participant U as User
+    participant G as Gradio UI
+    participant D as download_youtube_audio
+    participant I as infer()
+    participant P as Processor
+    participant M as Model
+    U->>G: Enter YouTube URL
+    U->>G: Click Load
+    G->>D: download_youtube_audio(url)
+    D->>D: Validate URL
+    D->>D: Check cache
+    D->>D: Download with yt-dlp
+    D->>G: Return file path
+    G->>G: Update audio component
+    U->>G: Enter prompt
+    U->>G: Click Generate
+    G->>I: infer(audio_path, youtube_url, prompt)
+    I->>I: Determine audio source
+    I->>I: Create conversation format
+    I->>P: apply_chat_template(conversations)
+    P->>P: Tokenize & format
+    P->>M: Send batch to device
+    M->>M: Generate tokens
+    M->>P: Return token IDs
+    P->>P: batch_decode()
+    P->>I: Return decoded text
+    I->>G: Return formatted result
+    G->>U: Display response
+```