Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Collab
516.5
TFLOPS
8
122
3
Charles hugie
bigpappic
Follow
Samii31's profile picture
1 follower
·
4 following
ctownmade
AI & ML interests
None yet
Recent Activity
new
activity
10 days ago
hover-nlp/hover:
Update README.md
replied
to
sayakpaul
's
post
12 days ago
Fast LoRA inference for Flux with Diffusers and PEFT 🚨 There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption. In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of: 1. `torch.compile` 2. Flash Attention 3 (when compatible) 3. Dynamic FP8 weight quantization (when compatible) 4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯 We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗 Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs. Learn the details and the full code here: https://huggingface.co/blog/lora-fast
replied
to
sayakpaul
's
post
12 days ago
Fast LoRA inference for Flux with Diffusers and PEFT 🚨 There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption. In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of: 1. `torch.compile` 2. Flash Attention 3 (when compatible) 3. Dynamic FP8 weight quantization (when compatible) 4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯 We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗 Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs. Learn the details and the full code here: https://huggingface.co/blog/lora-fast
View all activity
Organizations
None yet
bigpappic
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
hover-nlp/hover
10 days ago
Update README.md
1
#11 opened 10 days ago by
bigpappic
New activity in
nvidia/ToolScale
12 days ago
Update README.md
2
#4 opened 12 days ago by
bigpappic
New activity in
deepseek-ai/DeepSeek-OCR
21 days ago
Request: DOI
1
#99 opened 28 days ago by
Amer9i
New activity in
bigpappic/Hideout
2 months ago
Update README.md
#1 opened 2 months ago by
bigpappic
New activity in
deepseek-ai/DeepSeek-OCR
2 months ago
Hello
1
#39 opened 2 months ago by
anmolkumarjha
Make compatible with newer transformers
🤗
11
5
#38 opened 2 months ago by
harpreetsahota
fixing hardcoded cuda() for cpu inference
❤️
5
4
#21 opened 2 months ago by
alexgambashidze
DeepSeek-OCR running in google colab
😎
👍
9
11
#27 opened 2 months ago by
Javedalam
DeepSeek-OCR running in google colab
😎
👍
9
11
#27 opened 2 months ago by
Javedalam