anilyanamandra commited on
Commit
8c2765a
·
1 Parent(s): 678e362

Add Mermaid code flow diagrams

Browse files
Files changed (2) hide show
  1. code-flow.md +185 -0
  2. main-flow.md +130 -0
code-flow.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Music Flamingo Code Flow
2
+
3
+ ```mermaid
4
+ flowchart TD
5
+ Start([App Starts]) --> Init[Initialize App]
6
+ Init --> LoadModel[Load Music Flamingo Model<br/>processor & model from MODEL_ID]
7
+ LoadModel --> SetupProxy{Check for<br/>SSH Proxy?}
8
+ SetupProxy -->|Yes| CreateTunnel[Create SSH Tunnel]
9
+ SetupProxy -->|No| Ready[App Ready]
10
+ CreateTunnel --> Ready
11
+
12
+ Ready --> UI[Gradio UI Loaded]
13
+ UI --> UserInput{User Input}
14
+
15
+ UserInput -->|Upload Audio| AudioFile[Audio File Path]
16
+ UserInput -->|YouTube URL| YouTubeURL[YouTube URL String]
17
+ UserInput -->|Load Button| LoadYouTube[Load YouTube Audio]
18
+
19
+ LoadYouTube --> DownloadYT[download_youtube_audio]
20
+ DownloadYT --> CheckCache{URL in<br/>Cache?}
21
+ CheckCache -->|Yes & Exists| ReturnCached[Return Cached File]
22
+ CheckCache -->|No| ValidateURL[Validate YouTube URL<br/>with Regex]
23
+ ValidateURL -->|Invalid| Error1[Return Error Message]
24
+ ValidateURL -->|Valid| YTDL[yt-dlp Download]
25
+ YTDL --> ExtractAudio[Extract Audio to MP3]
26
+ ExtractAudio --> CacheFile[Cache File Path]
27
+ CacheFile --> ReturnFile[Return File Path]
28
+ ReturnCached --> AudioFile
29
+ ReturnFile --> AudioFile
30
+
31
+ AudioFile --> UserPrompt[User Enters Prompt]
32
+ UserPrompt --> ClickGenerate[Click Generate Button]
33
+
34
+ ClickGenerate --> Infer[infer Function]
35
+ Infer --> DetermineSource{Audio Source?}
36
+ DetermineSource -->|File Upload| UseFile[Use audio_path]
37
+ DetermineSource -->|YouTube| DownloadIfNeeded[Download if not cached]
38
+ DownloadIfNeeded --> UseFile
39
+
40
+ UseFile --> CreateConversation[Create Conversation Format]
41
+ CreateConversation --> FormatInput["conversations = [<br/> [{<br/> 'role': 'user',<br/> 'content': [<br/> {'type': 'text', 'text': prompt},<br/> {'type': 'audio', 'path': file}<br/> ]<br/> }]<br/>]"]
42
+
43
+ FormatInput --> ApplyTemplate[processor.apply_chat_template]
44
+ ApplyTemplate --> Tokenize[Tokenize Input]
45
+ Tokenize --> MoveToDevice[Move to model.device]
46
+
47
+ MoveToDevice --> Generate[model.generate<br/>max_new_tokens=4096]
48
+ Generate --> Decode[processor.batch_decode]
49
+ Decode --> FormatOutput[Format Result with Status]
50
+ FormatOutput --> Display[Display in Gradio UI]
51
+
52
+ Error1 --> Display
53
+
54
+ style Start fill:#90EE90
55
+ style LoadModel fill:#FFD700
56
+ style Generate fill:#FF6B6B
57
+ style Display fill:#4ECDC4
58
+ style Error1 fill:#FF6B6B
59
+ ```
60
+
61
+ ## Detailed Function Flow
62
+
63
+ ### 1. Initialization Flow
64
+ ```mermaid
65
+ sequenceDiagram
66
+ participant App
67
+ participant Model
68
+ participant Proxy
69
+
70
+ App->>Proxy: Check SSH environment variables
71
+ alt Proxy Available
72
+ Proxy->>Proxy: Create SSH tunnel
73
+ Proxy->>App: PROXY_URL set
74
+ end
75
+ App->>Model: Load processor from MODEL_ID
76
+ App->>Model: Load model with device_map="auto"
77
+ Model->>App: Model ready
78
+ App->>App: Launch Gradio UI
79
+ ```
80
+
81
+ ### 2. YouTube Download Flow
82
+ ```mermaid
83
+ flowchart LR
84
+ A[YouTube URL] --> B{Valid URL?}
85
+ B -->|No| C[Return Error]
86
+ B -->|Yes| D{Cached?}
87
+ D -->|Yes| E{File Exists?}
88
+ E -->|Yes| F[Return Cached]
89
+ E -->|No| G[Download]
90
+ D -->|No| G
91
+ G --> H[yt-dlp Download]
92
+ H --> I[Extract to MP3]
93
+ I --> J[Cache File]
94
+ J --> K[Return Path]
95
+
96
+ style C fill:#FF6B6B
97
+ style F fill:#90EE90
98
+ style K fill:#90EE90
99
+ ```
100
+
101
+ ### 3. Model Inference Flow
102
+ ```mermaid
103
+ sequenceDiagram
104
+ participant User
105
+ participant UI
106
+ participant Download
107
+ participant Processor
108
+ participant Model
109
+
110
+ User->>UI: Upload audio or YouTube URL
111
+ UI->>Download: Get audio file path
112
+ Download->>UI: Return file path
113
+ User->>UI: Enter prompt
114
+ User->>UI: Click Generate
115
+ UI->>Processor: Create conversation format
116
+ Processor->>Processor: apply_chat_template()
117
+ Processor->>Processor: Tokenize input
118
+ Processor->>Model: Send batch to device
119
+ Model->>Model: Generate tokens (max 4096)
120
+ Model->>Processor: Return token IDs
121
+ Processor->>Processor: batch_decode()
122
+ Processor->>UI: Return text result
123
+ UI->>User: Display response
124
+ ```
125
+
126
+ ## Key Functions
127
+
128
+ ### download_youtube_audio()
129
+ ```mermaid
130
+ flowchart TD
131
+ Start[download_youtube_audio] --> Validate[Validate URL with Regex]
132
+ Validate -->|Invalid| ReturnError[Return None, Error]
133
+ Validate -->|Valid| CheckCache{URL in Cache?}
134
+ CheckCache -->|Yes| CheckFile{File Exists?}
135
+ CheckFile -->|Yes| ReturnCached[Return Cached Path]
136
+ CheckFile -->|No| Download[Download Audio]
137
+ CheckCache -->|No| Download
138
+ Download --> YTDL[yt-dlp with Options]
139
+ YTDL --> Extract[Extract to MP3]
140
+ Extract --> Cache[Store in Cache]
141
+ Cache --> ReturnPath[Return Path, Status]
142
+
143
+ style ReturnError fill:#FF6B6B
144
+ style ReturnCached fill:#90EE90
145
+ style ReturnPath fill:#90EE90
146
+ ```
147
+
148
+ ### infer()
149
+ ```mermaid
150
+ flowchart TD
151
+ Start[infer Function] --> GetAudio{Get Audio}
152
+ GetAudio -->|File Upload| UseFile[Use audio_path]
153
+ GetAudio -->|YouTube| DownloadYT[Download YouTube]
154
+ DownloadYT -->|Success| UseFile
155
+ DownloadYT -->|Error| ReturnError[Return Error]
156
+ UseFile --> CreateConv[Create Conversation]
157
+ CreateConv --> ApplyTemplate[Apply Chat Template]
158
+ ApplyTemplate --> Generate[Model Generate]
159
+ Generate --> Decode[Decode Output]
160
+ Decode --> Format[Format Result]
161
+ Format --> Return[Return Text]
162
+
163
+ style ReturnError fill:#FF6B6B
164
+ style Return fill:#90EE90
165
+ ```
166
+
167
+ ## Data Flow
168
+
169
+ ```mermaid
170
+ flowchart LR
171
+ A[User Input] --> B{Input Type}
172
+ B -->|Audio File| C[File Path]
173
+ B -->|YouTube URL| D[Download Function]
174
+ D --> C
175
+ C --> E[Conversation Format]
176
+ E --> F[Processor]
177
+ F --> G[Model]
178
+ G --> H[Generated Text]
179
+ H --> I[UI Display]
180
+
181
+ style A fill:#4ECDC4
182
+ style G fill:#FF6B6B
183
+ style I fill:#90EE90
184
+ ```
185
+
main-flow.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Music Flamingo - Main Code Flow
2
+
3
+ ## Simplified Main Flow
4
+
5
+ ```mermaid
6
+ flowchart TD
7
+ Start([App Launch]) --> Init[Initialize]
8
+ Init --> LoadModel[Load Model & Processor]
9
+ LoadModel --> UI[Gradio UI Ready]
10
+
11
+ UI --> UserAction{User Action}
12
+
13
+ UserAction -->|Upload File| FilePath[Audio File Path]
14
+ UserAction -->|Enter YouTube URL| YTURL[YouTube URL]
15
+ UserAction -->|Click Load| LoadYT[Load YouTube Audio]
16
+
17
+ LoadYT --> Download[download_youtube_audio]
18
+ Download -->|Success| FilePath
19
+ Download -->|Error| ErrorMsg[Show Error]
20
+
21
+ FilePath --> EnterPrompt[User Enters Prompt]
22
+ EnterPrompt --> ClickGen[Click Generate]
23
+
24
+ ClickGen --> Infer[infer Function Called]
25
+
26
+ Infer --> GetAudio[Get Audio File Path]
27
+ GetAudio --> CreateConv["Create Conversation:<br/>{'role': 'user',<br/> 'content': [<br/> {'type': 'text', ...},<br/> {'type': 'audio', 'path': ...}<br/>]"]
28
+
29
+ CreateConv --> Process["processor.apply_chat_template()<br/>- Tokenize<br/>- Format"]
30
+
31
+ Process --> Generate["model.generate()<br/>max_new_tokens=4096"]
32
+
33
+ Generate --> Decode["processor.batch_decode()<br/>Skip special tokens"]
34
+
35
+ Decode --> Format[Format Output]
36
+ Format --> Display[Display in UI]
37
+
38
+ ErrorMsg --> Display
39
+
40
+ style Start fill:#90EE90
41
+ style LoadModel fill:#FFD700
42
+ style Generate fill:#FF6B6B
43
+ style Display fill:#4ECDC4
44
+ style ErrorMsg fill:#FF6B6B
45
+ ```
46
+
47
+ ## Component Interaction
48
+
49
+ ```mermaid
50
+ graph TB
51
+ subgraph "User Interface"
52
+ UI[Gradio Blocks]
53
+ AudioInput[Audio Component]
54
+ YTInput[YouTube Textbox]
55
+ PromptInput[Prompt Textbox]
56
+ Output[Output Textbox]
57
+ GenButton[Generate Button]
58
+ end
59
+
60
+ subgraph "Processing Layer"
61
+ DownloadFunc[download_youtube_audio]
62
+ InferFunc[infer Function]
63
+ end
64
+
65
+ subgraph "Model Layer"
66
+ Processor[AutoProcessor]
67
+ Model[AutoModel]
68
+ end
69
+
70
+ UI --> AudioInput
71
+ UI --> YTInput
72
+ UI --> PromptInput
73
+ UI --> GenButton
74
+ UI --> Output
75
+
76
+ YTInput --> DownloadFunc
77
+ DownloadFunc --> AudioInput
78
+
79
+ GenButton --> InferFunc
80
+ AudioInput --> InferFunc
81
+ PromptInput --> InferFunc
82
+
83
+ InferFunc --> Processor
84
+ Processor --> Model
85
+ Model --> Processor
86
+ Processor --> InferFunc
87
+ InferFunc --> Output
88
+
89
+ style UI fill:#4ECDC4
90
+ style Model fill:#FF6B6B
91
+ style Output fill:#90EE90
92
+ ```
93
+
94
+ ## Function Call Sequence
95
+
96
+ ```mermaid
97
+ sequenceDiagram
98
+ autonumber
99
+ participant U as User
100
+ participant G as Gradio UI
101
+ participant D as download_youtube_audio
102
+ participant I as infer()
103
+ participant P as Processor
104
+ participant M as Model
105
+
106
+ U->>G: Enter YouTube URL
107
+ U->>G: Click Load
108
+ G->>D: download_youtube_audio(url)
109
+ D->>D: Validate URL
110
+ D->>D: Check cache
111
+ D->>D: Download with yt-dlp
112
+ D->>G: Return file path
113
+ G->>G: Update audio component
114
+
115
+ U->>G: Enter prompt
116
+ U->>G: Click Generate
117
+ G->>I: infer(audio_path, youtube_url, prompt)
118
+ I->>I: Determine audio source
119
+ I->>I: Create conversation format
120
+ I->>P: apply_chat_template(conversations)
121
+ P->>P: Tokenize & format
122
+ P->>M: Send batch to device
123
+ M->>M: Generate tokens
124
+ M->>P: Return token IDs
125
+ P->>P: batch_decode()
126
+ P->>I: Return decoded text
127
+ I->>G: Return formatted result
128
+ G->>U: Display response
129
+ ```
130
+