Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ngxson 
posted an update Jan 14
Post
3931
Check out my collection of pre-made GGUF LoRA adapters!

This allow you to use both normal + abliterated version of popular models like llama, qwen, etc, without having to double to amount of VRAM usage.

ngxson/gguf_lora_collection

Tagging @bartowski @MaziyarPanahi and @mradermacher , you may want to give this a try!

With my llama-cpp-python (0.3.4), the following PR maybe have not been merged yet, so an error occurs when applying LoRA. I tried it with Qwen 2.5 14B Instruct. Well, it will be updated eventually.🙄
https://github.com/ggerganov/llama.cpp/issues/9114

This is super cool!!! Would you mind sharing the process of these GGUF LoRA adapters? Did you convert the LoRA into GGUF or made LoRA from the GGUF itself?

·

Yes, sure!

The first step is to generate the PEFT-compatible LoRA adapter, I used mergekit-extract-lora to do that. Please note that some bigger models (Qwen/Llama 70B) give some errors that I don't know how to fix, hopefully they will fix that soon. You can find more info about mergekit here: https://github.com/arcee-ai/mergekit

Next step is to convert PEFT to GGUF, I used this space: https://huggingface.co/spaces/ggml-org/gguf-my-lora

Then it's good to go!

Please note that, the space can convert any PEFT LoRA adapters to GGUF, so if you're using something like unsloth, it will be straight-forward to convert into GGUF LoRA (so no need to merge to base model)

This comment has been hidden