Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.1.0
Music Flamingo - Main Code Flow
Simplified Main Flow
flowchart TD
Start([App Launch]) --> Init[Initialize]
Init --> LoadModel[Load Model & Processor]
LoadModel --> UI[Gradio UI Ready]
UI --> UserAction{User Action}
UserAction -->|Upload File| FilePath[Audio File Path]
UserAction -->|Enter YouTube URL| YTURL[YouTube URL]
UserAction -->|Click Load| LoadYT[Load YouTube Audio]
LoadYT --> Download[download_youtube_audio]
Download -->|Success| FilePath
Download -->|Error| ErrorMsg[Show Error]
FilePath --> EnterPrompt[User Enters Prompt]
EnterPrompt --> ClickGen[Click Generate]
ClickGen --> Infer[infer Function Called]
Infer --> GetAudio[Get Audio File Path]
GetAudio --> CreateConv["Create Conversation:<br/>{'role': 'user',<br/> 'content': [<br/> {'type': 'text', ...},<br/> {'type': 'audio', 'path': ...}<br/>]"]
CreateConv --> Process["processor.apply_chat_template()<br/>- Tokenize<br/>- Format"]
Process --> Generate["model.generate()<br/>max_new_tokens=4096"]
Generate --> Decode["processor.batch_decode()<br/>Skip special tokens"]
Decode --> Format[Format Output]
Format --> Display[Display in UI]
ErrorMsg --> Display
style Start fill:#90EE90
style LoadModel fill:#FFD700
style Generate fill:#FF6B6B
style Display fill:#4ECDC4
style ErrorMsg fill:#FF6B6B
Component Interaction
graph TB
subgraph "User Interface"
UI[Gradio Blocks]
AudioInput[Audio Component]
YTInput[YouTube Textbox]
PromptInput[Prompt Textbox]
Output[Output Textbox]
GenButton[Generate Button]
end
subgraph "Processing Layer"
DownloadFunc[download_youtube_audio]
InferFunc[infer Function]
end
subgraph "Model Layer"
Processor[AutoProcessor]
Model[AutoModel]
end
UI --> AudioInput
UI --> YTInput
UI --> PromptInput
UI --> GenButton
UI --> Output
YTInput --> DownloadFunc
DownloadFunc --> AudioInput
GenButton --> InferFunc
AudioInput --> InferFunc
PromptInput --> InferFunc
InferFunc --> Processor
Processor --> Model
Model --> Processor
Processor --> InferFunc
InferFunc --> Output
style UI fill:#4ECDC4
style Model fill:#FF6B6B
style Output fill:#90EE90
Function Call Sequence
sequenceDiagram
autonumber
participant U as User
participant G as Gradio UI
participant D as download_youtube_audio
participant I as infer()
participant P as Processor
participant M as Model
U->>G: Enter YouTube URL
U->>G: Click Load
G->>D: download_youtube_audio(url)
D->>D: Validate URL
D->>D: Check cache
D->>D: Download with yt-dlp
D->>G: Return file path
G->>G: Update audio component
U->>G: Enter prompt
U->>G: Click Generate
G->>I: infer(audio_path, youtube_url, prompt)
I->>I: Determine audio source
I->>I: Create conversation format
I->>P: apply_chat_template(conversations)
P->>P: Tokenize & format
P->>M: Send batch to device
M->>M: Generate tokens
M->>P: Return token IDs
P->>P: batch_decode()
P->>I: Return decoded text
I->>G: Return formatted result
G->>U: Display response