Charles hugie's picture

Open to Collab

Charles hugie

bigpappic

·

ctownmade

AI & ML interests

None yet

Recent Activity

new activity 10 days ago

hover-nlp/hover:Update README.md

replied to sayakpaul's post 12 days ago

Fast LoRA inference for Flux with Diffusers and PEFT 🚨 There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption. In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of: 1. `torch.compile` 2. Flash Attention 3 (when compatible) 3. Dynamic FP8 weight quantization (when compatible) 4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯 We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗 Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs. Learn the details and the full code here: https://huggingface.co/blog/lora-fast

replied to sayakpaul's post 12 days ago

Fast LoRA inference for Flux with Diffusers and PEFT 🚨 There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption. In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of: 1. `torch.compile` 2. Flash Attention 3 (when compatible) 3. Dynamic FP8 weight quantization (when compatible) 4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯 We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗 Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs. Learn the details and the full code here: https://huggingface.co/blog/lora-fast

View all activity

Organizations

None yet

New activity in hover-nlp/hover 10 days ago

Update README.md

#11 opened 10 days ago by

New activity in nvidia/ToolScale 12 days ago

Update README.md

#4 opened 12 days ago by

New activity in deepseek-ai/DeepSeek-OCR 21 days ago

Request: DOI

#99 opened 28 days ago by

New activity in bigpappic/Hideout 2 months ago

Update README.md

#1 opened 2 months ago by

New activity in deepseek-ai/DeepSeek-OCR 2 months ago

Hello

#39 opened 2 months ago by

Make compatible with newer transformers

#38 opened 2 months ago by

fixing hardcoded cuda() for cpu inference

#21 opened 2 months ago by

alexgambashidze

DeepSeek-OCR running in google colab

#27 opened 2 months ago by

DeepSeek-OCR running in google colab

#27 opened 2 months ago by