music-flamingo-demo / main-flow.md
anilyanamandra's picture
Add Mermaid code flow diagrams
8c2765a

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Music Flamingo - Main Code Flow

Simplified Main Flow

flowchart TD
    Start([App Launch]) --> Init[Initialize]
    Init --> LoadModel[Load Model & Processor]
    LoadModel --> UI[Gradio UI Ready]
    
    UI --> UserAction{User Action}
    
    UserAction -->|Upload File| FilePath[Audio File Path]
    UserAction -->|Enter YouTube URL| YTURL[YouTube URL]
    UserAction -->|Click Load| LoadYT[Load YouTube Audio]
    
    LoadYT --> Download[download_youtube_audio]
    Download -->|Success| FilePath
    Download -->|Error| ErrorMsg[Show Error]
    
    FilePath --> EnterPrompt[User Enters Prompt]
    EnterPrompt --> ClickGen[Click Generate]
    
    ClickGen --> Infer[infer Function Called]
    
    Infer --> GetAudio[Get Audio File Path]
    GetAudio --> CreateConv["Create Conversation:<br/>{'role': 'user',<br/> 'content': [<br/>  {'type': 'text', ...},<br/>  {'type': 'audio', 'path': ...}<br/>]"]
    
    CreateConv --> Process["processor.apply_chat_template()<br/>- Tokenize<br/>- Format"]
    
    Process --> Generate["model.generate()<br/>max_new_tokens=4096"]
    
    Generate --> Decode["processor.batch_decode()<br/>Skip special tokens"]
    
    Decode --> Format[Format Output]
    Format --> Display[Display in UI]
    
    ErrorMsg --> Display
    
    style Start fill:#90EE90
    style LoadModel fill:#FFD700
    style Generate fill:#FF6B6B
    style Display fill:#4ECDC4
    style ErrorMsg fill:#FF6B6B

Component Interaction

graph TB
    subgraph "User Interface"
        UI[Gradio Blocks]
        AudioInput[Audio Component]
        YTInput[YouTube Textbox]
        PromptInput[Prompt Textbox]
        Output[Output Textbox]
        GenButton[Generate Button]
    end
    
    subgraph "Processing Layer"
        DownloadFunc[download_youtube_audio]
        InferFunc[infer Function]
    end
    
    subgraph "Model Layer"
        Processor[AutoProcessor]
        Model[AutoModel]
    end
    
    UI --> AudioInput
    UI --> YTInput
    UI --> PromptInput
    UI --> GenButton
    UI --> Output
    
    YTInput --> DownloadFunc
    DownloadFunc --> AudioInput
    
    GenButton --> InferFunc
    AudioInput --> InferFunc
    PromptInput --> InferFunc
    
    InferFunc --> Processor
    Processor --> Model
    Model --> Processor
    Processor --> InferFunc
    InferFunc --> Output
    
    style UI fill:#4ECDC4
    style Model fill:#FF6B6B
    style Output fill:#90EE90

Function Call Sequence

sequenceDiagram
    autonumber
    participant U as User
    participant G as Gradio UI
    participant D as download_youtube_audio
    participant I as infer()
    participant P as Processor
    participant M as Model
    
    U->>G: Enter YouTube URL
    U->>G: Click Load
    G->>D: download_youtube_audio(url)
    D->>D: Validate URL
    D->>D: Check cache
    D->>D: Download with yt-dlp
    D->>G: Return file path
    G->>G: Update audio component
    
    U->>G: Enter prompt
    U->>G: Click Generate
    G->>I: infer(audio_path, youtube_url, prompt)
    I->>I: Determine audio source
    I->>I: Create conversation format
    I->>P: apply_chat_template(conversations)
    P->>P: Tokenize & format
    P->>M: Send batch to device
    M->>M: Generate tokens
    M->>P: Return token IDs
    P->>P: batch_decode()
    P->>I: Return decoded text
    I->>G: Return formatted result
    G->>U: Display response