AI & ML interests

Optimised quants for high-throughput deployments! Compatible with Transformers, TGI & vLLM 🤗

danieldk 
posted an update about 2 months ago
Xenova 
posted an update 4 months ago
view post
Post
11755
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
  • 1 reply
·
Xenova 
posted an update 4 months ago
view post
Post
4521
The next generation of AI-powered websites is going to be WILD! 🤯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀
  • 2 replies
·
Xenova 
posted an update 5 months ago
view post
Post
3409
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU
danieldk 
posted an update 5 months ago
view post
Post
2055
kernels 0.8.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.8.0

This release refines kernel selection in the kernelize function:

• You can now register kernels for certain CUDA capability ranges.
• Rather than doing exact mating of modes, fall back to other compatible modes. If you are kernelizing for inference, but you only registered a training + torch.compile kernel, it will use that kernel since it is compatible with inference as well.
  • 1 reply
·
danieldk 
posted an update 5 months ago
danieldk 
posted an update 5 months ago
view post
Post
379
Kernels 0.7.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.7.0 🚀

This release makes it possible to register multiple kernels for a layer. Do you have a super-fast kernel for inference and another kernel for training? Register them both and kernelize will pick the kernel depending on whether you are going to do training or inference.
dvilasuero 
posted an update 6 months ago
view post
Post
3215
Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.

A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.

Today, I'm thrilled to introduce our first step in this direction.


In a nutshell:

📁 Effortlessly run prompts and models over your data.
🌐 Agentic search for accuracy and real-time information.
🖼️ Familiar, minimalistic interface for interacting with data.
🎯 Human feedback 2.0: Your input directly improves generated data.
💯 Access hundreds of open models and leading inference providers.

Go to this space to try it out!

aisheets/sheets

Leave your questions below, we're just getting started!
·
Xenova 
posted an update 6 months ago
view post
Post
7350
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

🔐 Privacy by design (no data leaves your device)
💰 Completely free... forever
📦 Zero installation required, just visit a website
⚡️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!
·
danieldk 
posted an update 6 months ago
view post
Post
1918
We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
  • 2 replies
·
Xenova 
posted an update 7 months ago
Xenova 
posted an update 8 months ago
view post
Post
3033
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! 🤯

Well, with 🤗 Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! ⚡️

Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.

For now, try out the demo! 👇
webml-community/Zyphra-ZR1-WebGPU
  • 1 reply
·
alvarobartt 
posted an update 10 months ago
view post
Post
3531
🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)
Xenova 
posted an update 10 months ago
view post
Post
14371
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?
·