Spaces:
Runtime error
Runtime error
Commit
·
8c2765a
1
Parent(s):
678e362
Add Mermaid code flow diagrams
Browse files- code-flow.md +185 -0
- main-flow.md +130 -0
code-flow.md
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Music Flamingo Code Flow
|
| 2 |
+
|
| 3 |
+
```mermaid
|
| 4 |
+
flowchart TD
|
| 5 |
+
Start([App Starts]) --> Init[Initialize App]
|
| 6 |
+
Init --> LoadModel[Load Music Flamingo Model<br/>processor & model from MODEL_ID]
|
| 7 |
+
LoadModel --> SetupProxy{Check for<br/>SSH Proxy?}
|
| 8 |
+
SetupProxy -->|Yes| CreateTunnel[Create SSH Tunnel]
|
| 9 |
+
SetupProxy -->|No| Ready[App Ready]
|
| 10 |
+
CreateTunnel --> Ready
|
| 11 |
+
|
| 12 |
+
Ready --> UI[Gradio UI Loaded]
|
| 13 |
+
UI --> UserInput{User Input}
|
| 14 |
+
|
| 15 |
+
UserInput -->|Upload Audio| AudioFile[Audio File Path]
|
| 16 |
+
UserInput -->|YouTube URL| YouTubeURL[YouTube URL String]
|
| 17 |
+
UserInput -->|Load Button| LoadYouTube[Load YouTube Audio]
|
| 18 |
+
|
| 19 |
+
LoadYouTube --> DownloadYT[download_youtube_audio]
|
| 20 |
+
DownloadYT --> CheckCache{URL in<br/>Cache?}
|
| 21 |
+
CheckCache -->|Yes & Exists| ReturnCached[Return Cached File]
|
| 22 |
+
CheckCache -->|No| ValidateURL[Validate YouTube URL<br/>with Regex]
|
| 23 |
+
ValidateURL -->|Invalid| Error1[Return Error Message]
|
| 24 |
+
ValidateURL -->|Valid| YTDL[yt-dlp Download]
|
| 25 |
+
YTDL --> ExtractAudio[Extract Audio to MP3]
|
| 26 |
+
ExtractAudio --> CacheFile[Cache File Path]
|
| 27 |
+
CacheFile --> ReturnFile[Return File Path]
|
| 28 |
+
ReturnCached --> AudioFile
|
| 29 |
+
ReturnFile --> AudioFile
|
| 30 |
+
|
| 31 |
+
AudioFile --> UserPrompt[User Enters Prompt]
|
| 32 |
+
UserPrompt --> ClickGenerate[Click Generate Button]
|
| 33 |
+
|
| 34 |
+
ClickGenerate --> Infer[infer Function]
|
| 35 |
+
Infer --> DetermineSource{Audio Source?}
|
| 36 |
+
DetermineSource -->|File Upload| UseFile[Use audio_path]
|
| 37 |
+
DetermineSource -->|YouTube| DownloadIfNeeded[Download if not cached]
|
| 38 |
+
DownloadIfNeeded --> UseFile
|
| 39 |
+
|
| 40 |
+
UseFile --> CreateConversation[Create Conversation Format]
|
| 41 |
+
CreateConversation --> FormatInput["conversations = [<br/> [{<br/> 'role': 'user',<br/> 'content': [<br/> {'type': 'text', 'text': prompt},<br/> {'type': 'audio', 'path': file}<br/> ]<br/> }]<br/>]"]
|
| 42 |
+
|
| 43 |
+
FormatInput --> ApplyTemplate[processor.apply_chat_template]
|
| 44 |
+
ApplyTemplate --> Tokenize[Tokenize Input]
|
| 45 |
+
Tokenize --> MoveToDevice[Move to model.device]
|
| 46 |
+
|
| 47 |
+
MoveToDevice --> Generate[model.generate<br/>max_new_tokens=4096]
|
| 48 |
+
Generate --> Decode[processor.batch_decode]
|
| 49 |
+
Decode --> FormatOutput[Format Result with Status]
|
| 50 |
+
FormatOutput --> Display[Display in Gradio UI]
|
| 51 |
+
|
| 52 |
+
Error1 --> Display
|
| 53 |
+
|
| 54 |
+
style Start fill:#90EE90
|
| 55 |
+
style LoadModel fill:#FFD700
|
| 56 |
+
style Generate fill:#FF6B6B
|
| 57 |
+
style Display fill:#4ECDC4
|
| 58 |
+
style Error1 fill:#FF6B6B
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## Detailed Function Flow
|
| 62 |
+
|
| 63 |
+
### 1. Initialization Flow
|
| 64 |
+
```mermaid
|
| 65 |
+
sequenceDiagram
|
| 66 |
+
participant App
|
| 67 |
+
participant Model
|
| 68 |
+
participant Proxy
|
| 69 |
+
|
| 70 |
+
App->>Proxy: Check SSH environment variables
|
| 71 |
+
alt Proxy Available
|
| 72 |
+
Proxy->>Proxy: Create SSH tunnel
|
| 73 |
+
Proxy->>App: PROXY_URL set
|
| 74 |
+
end
|
| 75 |
+
App->>Model: Load processor from MODEL_ID
|
| 76 |
+
App->>Model: Load model with device_map="auto"
|
| 77 |
+
Model->>App: Model ready
|
| 78 |
+
App->>App: Launch Gradio UI
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### 2. YouTube Download Flow
|
| 82 |
+
```mermaid
|
| 83 |
+
flowchart LR
|
| 84 |
+
A[YouTube URL] --> B{Valid URL?}
|
| 85 |
+
B -->|No| C[Return Error]
|
| 86 |
+
B -->|Yes| D{Cached?}
|
| 87 |
+
D -->|Yes| E{File Exists?}
|
| 88 |
+
E -->|Yes| F[Return Cached]
|
| 89 |
+
E -->|No| G[Download]
|
| 90 |
+
D -->|No| G
|
| 91 |
+
G --> H[yt-dlp Download]
|
| 92 |
+
H --> I[Extract to MP3]
|
| 93 |
+
I --> J[Cache File]
|
| 94 |
+
J --> K[Return Path]
|
| 95 |
+
|
| 96 |
+
style C fill:#FF6B6B
|
| 97 |
+
style F fill:#90EE90
|
| 98 |
+
style K fill:#90EE90
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### 3. Model Inference Flow
|
| 102 |
+
```mermaid
|
| 103 |
+
sequenceDiagram
|
| 104 |
+
participant User
|
| 105 |
+
participant UI
|
| 106 |
+
participant Download
|
| 107 |
+
participant Processor
|
| 108 |
+
participant Model
|
| 109 |
+
|
| 110 |
+
User->>UI: Upload audio or YouTube URL
|
| 111 |
+
UI->>Download: Get audio file path
|
| 112 |
+
Download->>UI: Return file path
|
| 113 |
+
User->>UI: Enter prompt
|
| 114 |
+
User->>UI: Click Generate
|
| 115 |
+
UI->>Processor: Create conversation format
|
| 116 |
+
Processor->>Processor: apply_chat_template()
|
| 117 |
+
Processor->>Processor: Tokenize input
|
| 118 |
+
Processor->>Model: Send batch to device
|
| 119 |
+
Model->>Model: Generate tokens (max 4096)
|
| 120 |
+
Model->>Processor: Return token IDs
|
| 121 |
+
Processor->>Processor: batch_decode()
|
| 122 |
+
Processor->>UI: Return text result
|
| 123 |
+
UI->>User: Display response
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
## Key Functions
|
| 127 |
+
|
| 128 |
+
### download_youtube_audio()
|
| 129 |
+
```mermaid
|
| 130 |
+
flowchart TD
|
| 131 |
+
Start[download_youtube_audio] --> Validate[Validate URL with Regex]
|
| 132 |
+
Validate -->|Invalid| ReturnError[Return None, Error]
|
| 133 |
+
Validate -->|Valid| CheckCache{URL in Cache?}
|
| 134 |
+
CheckCache -->|Yes| CheckFile{File Exists?}
|
| 135 |
+
CheckFile -->|Yes| ReturnCached[Return Cached Path]
|
| 136 |
+
CheckFile -->|No| Download[Download Audio]
|
| 137 |
+
CheckCache -->|No| Download
|
| 138 |
+
Download --> YTDL[yt-dlp with Options]
|
| 139 |
+
YTDL --> Extract[Extract to MP3]
|
| 140 |
+
Extract --> Cache[Store in Cache]
|
| 141 |
+
Cache --> ReturnPath[Return Path, Status]
|
| 142 |
+
|
| 143 |
+
style ReturnError fill:#FF6B6B
|
| 144 |
+
style ReturnCached fill:#90EE90
|
| 145 |
+
style ReturnPath fill:#90EE90
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
### infer()
|
| 149 |
+
```mermaid
|
| 150 |
+
flowchart TD
|
| 151 |
+
Start[infer Function] --> GetAudio{Get Audio}
|
| 152 |
+
GetAudio -->|File Upload| UseFile[Use audio_path]
|
| 153 |
+
GetAudio -->|YouTube| DownloadYT[Download YouTube]
|
| 154 |
+
DownloadYT -->|Success| UseFile
|
| 155 |
+
DownloadYT -->|Error| ReturnError[Return Error]
|
| 156 |
+
UseFile --> CreateConv[Create Conversation]
|
| 157 |
+
CreateConv --> ApplyTemplate[Apply Chat Template]
|
| 158 |
+
ApplyTemplate --> Generate[Model Generate]
|
| 159 |
+
Generate --> Decode[Decode Output]
|
| 160 |
+
Decode --> Format[Format Result]
|
| 161 |
+
Format --> Return[Return Text]
|
| 162 |
+
|
| 163 |
+
style ReturnError fill:#FF6B6B
|
| 164 |
+
style Return fill:#90EE90
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
## Data Flow
|
| 168 |
+
|
| 169 |
+
```mermaid
|
| 170 |
+
flowchart LR
|
| 171 |
+
A[User Input] --> B{Input Type}
|
| 172 |
+
B -->|Audio File| C[File Path]
|
| 173 |
+
B -->|YouTube URL| D[Download Function]
|
| 174 |
+
D --> C
|
| 175 |
+
C --> E[Conversation Format]
|
| 176 |
+
E --> F[Processor]
|
| 177 |
+
F --> G[Model]
|
| 178 |
+
G --> H[Generated Text]
|
| 179 |
+
H --> I[UI Display]
|
| 180 |
+
|
| 181 |
+
style A fill:#4ECDC4
|
| 182 |
+
style G fill:#FF6B6B
|
| 183 |
+
style I fill:#90EE90
|
| 184 |
+
```
|
| 185 |
+
|
main-flow.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Music Flamingo - Main Code Flow
|
| 2 |
+
|
| 3 |
+
## Simplified Main Flow
|
| 4 |
+
|
| 5 |
+
```mermaid
|
| 6 |
+
flowchart TD
|
| 7 |
+
Start([App Launch]) --> Init[Initialize]
|
| 8 |
+
Init --> LoadModel[Load Model & Processor]
|
| 9 |
+
LoadModel --> UI[Gradio UI Ready]
|
| 10 |
+
|
| 11 |
+
UI --> UserAction{User Action}
|
| 12 |
+
|
| 13 |
+
UserAction -->|Upload File| FilePath[Audio File Path]
|
| 14 |
+
UserAction -->|Enter YouTube URL| YTURL[YouTube URL]
|
| 15 |
+
UserAction -->|Click Load| LoadYT[Load YouTube Audio]
|
| 16 |
+
|
| 17 |
+
LoadYT --> Download[download_youtube_audio]
|
| 18 |
+
Download -->|Success| FilePath
|
| 19 |
+
Download -->|Error| ErrorMsg[Show Error]
|
| 20 |
+
|
| 21 |
+
FilePath --> EnterPrompt[User Enters Prompt]
|
| 22 |
+
EnterPrompt --> ClickGen[Click Generate]
|
| 23 |
+
|
| 24 |
+
ClickGen --> Infer[infer Function Called]
|
| 25 |
+
|
| 26 |
+
Infer --> GetAudio[Get Audio File Path]
|
| 27 |
+
GetAudio --> CreateConv["Create Conversation:<br/>{'role': 'user',<br/> 'content': [<br/> {'type': 'text', ...},<br/> {'type': 'audio', 'path': ...}<br/>]"]
|
| 28 |
+
|
| 29 |
+
CreateConv --> Process["processor.apply_chat_template()<br/>- Tokenize<br/>- Format"]
|
| 30 |
+
|
| 31 |
+
Process --> Generate["model.generate()<br/>max_new_tokens=4096"]
|
| 32 |
+
|
| 33 |
+
Generate --> Decode["processor.batch_decode()<br/>Skip special tokens"]
|
| 34 |
+
|
| 35 |
+
Decode --> Format[Format Output]
|
| 36 |
+
Format --> Display[Display in UI]
|
| 37 |
+
|
| 38 |
+
ErrorMsg --> Display
|
| 39 |
+
|
| 40 |
+
style Start fill:#90EE90
|
| 41 |
+
style LoadModel fill:#FFD700
|
| 42 |
+
style Generate fill:#FF6B6B
|
| 43 |
+
style Display fill:#4ECDC4
|
| 44 |
+
style ErrorMsg fill:#FF6B6B
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Component Interaction
|
| 48 |
+
|
| 49 |
+
```mermaid
|
| 50 |
+
graph TB
|
| 51 |
+
subgraph "User Interface"
|
| 52 |
+
UI[Gradio Blocks]
|
| 53 |
+
AudioInput[Audio Component]
|
| 54 |
+
YTInput[YouTube Textbox]
|
| 55 |
+
PromptInput[Prompt Textbox]
|
| 56 |
+
Output[Output Textbox]
|
| 57 |
+
GenButton[Generate Button]
|
| 58 |
+
end
|
| 59 |
+
|
| 60 |
+
subgraph "Processing Layer"
|
| 61 |
+
DownloadFunc[download_youtube_audio]
|
| 62 |
+
InferFunc[infer Function]
|
| 63 |
+
end
|
| 64 |
+
|
| 65 |
+
subgraph "Model Layer"
|
| 66 |
+
Processor[AutoProcessor]
|
| 67 |
+
Model[AutoModel]
|
| 68 |
+
end
|
| 69 |
+
|
| 70 |
+
UI --> AudioInput
|
| 71 |
+
UI --> YTInput
|
| 72 |
+
UI --> PromptInput
|
| 73 |
+
UI --> GenButton
|
| 74 |
+
UI --> Output
|
| 75 |
+
|
| 76 |
+
YTInput --> DownloadFunc
|
| 77 |
+
DownloadFunc --> AudioInput
|
| 78 |
+
|
| 79 |
+
GenButton --> InferFunc
|
| 80 |
+
AudioInput --> InferFunc
|
| 81 |
+
PromptInput --> InferFunc
|
| 82 |
+
|
| 83 |
+
InferFunc --> Processor
|
| 84 |
+
Processor --> Model
|
| 85 |
+
Model --> Processor
|
| 86 |
+
Processor --> InferFunc
|
| 87 |
+
InferFunc --> Output
|
| 88 |
+
|
| 89 |
+
style UI fill:#4ECDC4
|
| 90 |
+
style Model fill:#FF6B6B
|
| 91 |
+
style Output fill:#90EE90
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
## Function Call Sequence
|
| 95 |
+
|
| 96 |
+
```mermaid
|
| 97 |
+
sequenceDiagram
|
| 98 |
+
autonumber
|
| 99 |
+
participant U as User
|
| 100 |
+
participant G as Gradio UI
|
| 101 |
+
participant D as download_youtube_audio
|
| 102 |
+
participant I as infer()
|
| 103 |
+
participant P as Processor
|
| 104 |
+
participant M as Model
|
| 105 |
+
|
| 106 |
+
U->>G: Enter YouTube URL
|
| 107 |
+
U->>G: Click Load
|
| 108 |
+
G->>D: download_youtube_audio(url)
|
| 109 |
+
D->>D: Validate URL
|
| 110 |
+
D->>D: Check cache
|
| 111 |
+
D->>D: Download with yt-dlp
|
| 112 |
+
D->>G: Return file path
|
| 113 |
+
G->>G: Update audio component
|
| 114 |
+
|
| 115 |
+
U->>G: Enter prompt
|
| 116 |
+
U->>G: Click Generate
|
| 117 |
+
G->>I: infer(audio_path, youtube_url, prompt)
|
| 118 |
+
I->>I: Determine audio source
|
| 119 |
+
I->>I: Create conversation format
|
| 120 |
+
I->>P: apply_chat_template(conversations)
|
| 121 |
+
P->>P: Tokenize & format
|
| 122 |
+
P->>M: Send batch to device
|
| 123 |
+
M->>M: Generate tokens
|
| 124 |
+
M->>P: Return token IDs
|
| 125 |
+
P->>P: batch_decode()
|
| 126 |
+
P->>I: Return decoded text
|
| 127 |
+
I->>G: Return formatted result
|
| 128 |
+
G->>U: Display response
|
| 129 |
+
```
|
| 130 |
+
|