music-flamingo-demo / code-flow.md
anilyanamandra's picture
Add Mermaid code flow diagrams
8c2765a

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Music Flamingo Code Flow

flowchart TD
    Start([App Starts]) --> Init[Initialize App]
    Init --> LoadModel[Load Music Flamingo Model<br/>processor & model from MODEL_ID]
    LoadModel --> SetupProxy{Check for<br/>SSH Proxy?}
    SetupProxy -->|Yes| CreateTunnel[Create SSH Tunnel]
    SetupProxy -->|No| Ready[App Ready]
    CreateTunnel --> Ready
    
    Ready --> UI[Gradio UI Loaded]
    UI --> UserInput{User Input}
    
    UserInput -->|Upload Audio| AudioFile[Audio File Path]
    UserInput -->|YouTube URL| YouTubeURL[YouTube URL String]
    UserInput -->|Load Button| LoadYouTube[Load YouTube Audio]
    
    LoadYouTube --> DownloadYT[download_youtube_audio]
    DownloadYT --> CheckCache{URL in<br/>Cache?}
    CheckCache -->|Yes & Exists| ReturnCached[Return Cached File]
    CheckCache -->|No| ValidateURL[Validate YouTube URL<br/>with Regex]
    ValidateURL -->|Invalid| Error1[Return Error Message]
    ValidateURL -->|Valid| YTDL[yt-dlp Download]
    YTDL --> ExtractAudio[Extract Audio to MP3]
    ExtractAudio --> CacheFile[Cache File Path]
    CacheFile --> ReturnFile[Return File Path]
    ReturnCached --> AudioFile
    ReturnFile --> AudioFile
    
    AudioFile --> UserPrompt[User Enters Prompt]
    UserPrompt --> ClickGenerate[Click Generate Button]
    
    ClickGenerate --> Infer[infer Function]
    Infer --> DetermineSource{Audio Source?}
    DetermineSource -->|File Upload| UseFile[Use audio_path]
    DetermineSource -->|YouTube| DownloadIfNeeded[Download if not cached]
    DownloadIfNeeded --> UseFile
    
    UseFile --> CreateConversation[Create Conversation Format]
    CreateConversation --> FormatInput["conversations = [<br/>  [{<br/>    'role': 'user',<br/>    'content': [<br/>      {'type': 'text', 'text': prompt},<br/>      {'type': 'audio', 'path': file}<br/>    ]<br/>  }]<br/>]"]
    
    FormatInput --> ApplyTemplate[processor.apply_chat_template]
    ApplyTemplate --> Tokenize[Tokenize Input]
    Tokenize --> MoveToDevice[Move to model.device]
    
    MoveToDevice --> Generate[model.generate<br/>max_new_tokens=4096]
    Generate --> Decode[processor.batch_decode]
    Decode --> FormatOutput[Format Result with Status]
    FormatOutput --> Display[Display in Gradio UI]
    
    Error1 --> Display
    
    style Start fill:#90EE90
    style LoadModel fill:#FFD700
    style Generate fill:#FF6B6B
    style Display fill:#4ECDC4
    style Error1 fill:#FF6B6B

Detailed Function Flow

1. Initialization Flow

sequenceDiagram
    participant App
    participant Model
    participant Proxy
    
    App->>Proxy: Check SSH environment variables
    alt Proxy Available
        Proxy->>Proxy: Create SSH tunnel
        Proxy->>App: PROXY_URL set
    end
    App->>Model: Load processor from MODEL_ID
    App->>Model: Load model with device_map="auto"
    Model->>App: Model ready
    App->>App: Launch Gradio UI

2. YouTube Download Flow

flowchart LR
    A[YouTube URL] --> B{Valid URL?}
    B -->|No| C[Return Error]
    B -->|Yes| D{Cached?}
    D -->|Yes| E{File Exists?}
    E -->|Yes| F[Return Cached]
    E -->|No| G[Download]
    D -->|No| G
    G --> H[yt-dlp Download]
    H --> I[Extract to MP3]
    I --> J[Cache File]
    J --> K[Return Path]
    
    style C fill:#FF6B6B
    style F fill:#90EE90
    style K fill:#90EE90

3. Model Inference Flow

sequenceDiagram
    participant User
    participant UI
    participant Download
    participant Processor
    participant Model
    
    User->>UI: Upload audio or YouTube URL
    UI->>Download: Get audio file path
    Download->>UI: Return file path
    User->>UI: Enter prompt
    User->>UI: Click Generate
    UI->>Processor: Create conversation format
    Processor->>Processor: apply_chat_template()
    Processor->>Processor: Tokenize input
    Processor->>Model: Send batch to device
    Model->>Model: Generate tokens (max 4096)
    Model->>Processor: Return token IDs
    Processor->>Processor: batch_decode()
    Processor->>UI: Return text result
    UI->>User: Display response

Key Functions

download_youtube_audio()

flowchart TD
    Start[download_youtube_audio] --> Validate[Validate URL with Regex]
    Validate -->|Invalid| ReturnError[Return None, Error]
    Validate -->|Valid| CheckCache{URL in Cache?}
    CheckCache -->|Yes| CheckFile{File Exists?}
    CheckFile -->|Yes| ReturnCached[Return Cached Path]
    CheckFile -->|No| Download[Download Audio]
    CheckCache -->|No| Download
    Download --> YTDL[yt-dlp with Options]
    YTDL --> Extract[Extract to MP3]
    Extract --> Cache[Store in Cache]
    Cache --> ReturnPath[Return Path, Status]
    
    style ReturnError fill:#FF6B6B
    style ReturnCached fill:#90EE90
    style ReturnPath fill:#90EE90

infer()

flowchart TD
    Start[infer Function] --> GetAudio{Get Audio}
    GetAudio -->|File Upload| UseFile[Use audio_path]
    GetAudio -->|YouTube| DownloadYT[Download YouTube]
    DownloadYT -->|Success| UseFile
    DownloadYT -->|Error| ReturnError[Return Error]
    UseFile --> CreateConv[Create Conversation]
    CreateConv --> ApplyTemplate[Apply Chat Template]
    ApplyTemplate --> Generate[Model Generate]
    Generate --> Decode[Decode Output]
    Decode --> Format[Format Result]
    Format --> Return[Return Text]
    
    style ReturnError fill:#FF6B6B
    style Return fill:#90EE90

Data Flow

flowchart LR
    A[User Input] --> B{Input Type}
    B -->|Audio File| C[File Path]
    B -->|YouTube URL| D[Download Function]
    D --> C
    C --> E[Conversation Format]
    E --> F[Processor]
    F --> G[Model]
    G --> H[Generated Text]
    H --> I[UI Display]
    
    style A fill:#4ECDC4
    style G fill:#FF6B6B
    style I fill:#90EE90