dicta-il
/

DictaLM-3.0-1.7B-Thinking-W4A16

Text Generation

compressed-tensors

Model card Files Files and versions

DictaLM-3.0-1.7B-Thinking-W4A16 / README.md

Shaltiel's picture

Create README.md

603f3a5 verified 8 days ago

|

history blame contribute delete

3.08 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	language:
	- en
	- he
	tags:
	- pretrained
	inference:
	parameters:
	temperature: 0.6
	---

	[<img src="https://i.ibb.co/5Lbwyr1/dicta-logo.jpg" width="300px"/>](https://dicta.org.il)

	# Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs

	Dicta-LM 3.0 is a powerful open-weight collection of LLMs, trained on extensive corpora of Hebrew and English texts. The models are available for download and for unlimited use. The models set a new SOTA for their weight-class for Hebrew, both as base models and chat models.

	This is the 1.7-billion-parameter reasoning model, originally initialized from [Qwen3-1.7B-Base](https://huggingface.co/Qwen/Qwen3-1.7B-Base).

	This version of the model is quantized to 4-bits (with 16-bit activations), allowing for inference with significantly less memory although with weaker performance.

	This model is a reasoning chat model, which means that before responding to any given message from the user, the model first thinks out the right way to respond in a designated thinking block.

	For full details of this model please read our [release blog post](https://dicta.org.il/dicta-lm-3) or the [technical report](https://www.dicta.org.il/publications/DictaLM_3_0___Techincal_Report.pdf).

	You can view and access the full collection of base/instruct unquantized/quantized versions of `DictaLM 3.0` [here](https://huggingface.co/collections/dicta-il/dictalm-30-collection).

	## Instruction format

	In order to leverage instruction fine-tuning, your prompt should be rendered using the chat template specified for this model. Most libraries deal with this automatically, so you can just let them do it.

	## Usage

	We recommend using vLLM, but you can use Transformers as well:

	### Transformers

	### vLLM

	```bash
	vllm serve dicta-il/DictaLM-3.0-1.7B-Thinking-W4A16 --enable-auto-tool-choice --tool-call-parser hermes --reasoning_parser deepseek_r1
	```

	And then you can access it via the openai library:

	```python
	from openai import OpenAI

	client = OpenAI(
	base_url="http://localhost:8000/v1",
	api_key="sk-no-key-required"
	)

	response = client.chat.completions.create(
	model="dicta-il/DictaLM-3.0-1.7B-Thinking-W4A16",
	messages=[
	{"role": "user", "content": "Hello, how are you?"}
	],
	)

	print(response.choices[0].message.content)
	```

	> The reasoning traces should be available in the response structure in the designated fild.

	The model supports tool-calling, enabling integration with external tools and APIs. For example how to use the tool calling, see the [vLLM documentation](https://docs.vllm.ai/en/stable/features/tool_calling/#tool-calling).

	## Citation

	If you use this model, please cite:

	```bibtex
	@article{Shmidman2025DictaLM3,
	title={{Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs}},
	author={Shaltiel Shmidman and Avi Shmidman and Amir DN Cohen and Moshe Koppel},
	year={2025},
	publisher={{DICTA / Jerusalem, Israel}},
	note={https://www.dicta.org.il/publications/DictaLM_3_0___Techincal_Report.pdf}
	}
	```