xiaoyan001 commited on
Commit
8534af7
·
verified ·
1 Parent(s): f92aa7c

Upload UNO Scorer (initial version)

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # UNO-Scorer: A Unified General Scoring Model for UNO-Bench
3
+
4
+ <div align="center">
5
+
6
+ [![Paper](https://img.shields.io/badge/Paper-Arxiv%3A2510.18915-red)](https://arxiv.org/abs/2510.18915)
7
+ [![Base Model](https://img.shields.io/badge/Base%20Model-Qwen3--14B-blue)](https://huggingface.co/Qwen/Qwen3-14B)
8
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green)]()
9
+
10
+ </div>
11
+
12
+ ## 📖 Introduction
13
+
14
+ **UNO-Scorer** is a lightweight yet high-precision general scoring model developed as part of **UNO-Bench**. It is designed to efficiently automate the evaluation of Large Multimodal Models (LMMs) with minimal computational overhead.
15
+
16
+ Built upon the powerful **Qwen3-14B** backbone, UNO-Scorer is fine-tuned on 13K high-quality in-house data. It overcomes the limitations of traditional Overall Reward Models (ORMs) by supporting **6 distinct question types**, with particular excellence in **Multi-Step Open-Ended Questions (MO)**.
17
+
18
+ ## 📊 Performance
19
+
20
+ UNO-Scorer demonstrates superior performance in automated evaluation, particularly in handling complex **Multi-Step Open-Ended Questions**. We compared the accuracy of our scorer against other advanced evaluators:
21
+
22
+ | Model | Accuracy |
23
+ | :--- | :--- |
24
+ | Seed-1.5-VL | 0.9118 |
25
+ | GPT-4.1 | 0.9457 |
26
+ | **UNO-Scorer (Ours)** | **0.9505** |
27
+
28
+ Experiments show that UNO-Scorer surpasses even proprietary frontier models like GPT-4.1 in this specific evaluation domain with lower cost.
29
+
30
+
31
+
32
+ ## 💻 Usage
33
+
34
+ ### 0. Quick Start
35
+
36
+ ```bash
37
+ pip install -U transformers
38
+ python3 test_scorer_hf.py --model-name /path/to/your/model
39
+ ```
40
+
41
+ We recommend using vLLM for inference as it offers significantly better efficiency compared to the standard HuggingFace approach. Please follow the steps below to set up the environment and run the inference script provided in our official repository.
42
+
43
+
44
+ ### 1. Clone the Repository
45
+ First, clone the UNO-Bench repository:
46
+
47
+ ```bash
48
+ git clone https://github.com/meituan-longcat/UNO-Bench.git
49
+ cd UNO-Bench/uno_eval
50
+ ```
51
+
52
+ ### 2. Install Dependencies
53
+ Install the necessary Python libraries:
54
+
55
+ ```bash
56
+ pip install -r requirement.txt
57
+ ```
58
+
59
+ ### 3. Run Inference
60
+ We provide an example script based on **vLLM** for efficient model inference. You can run the following command to test the scorer:
61
+
62
+ ```bash
63
+ bash examples/test_scorer.sh
64
+ ```
65
+
66
+ ### 4. Adapt Your Reference Answer
67
+ The most critical aspect of utilizing the UNO-Scorer lies in the proper formatting of the Reference Answer. Specifically, it is required to:
68
+
69
+ 1. Assign point values to the answer components. The total points for the question should typically sum to 10 points.
70
+ 2. You may customize detailed scoring criteria for each reference answer to suit your needs(e.g., clarifying how to judge cases where the final choice is correct but the reasoning is flawed).
71
+
72
+ Note: Since the model is primarily trained on Chinese corpora, it adheres more accurately to instructions when these specific descriptions are written in Chinese.
73
+
74
+ You can structure the Reference Answer as follows:
75
+
76
+ | Question Type | Scenario | **Reference Answer** | Example |
77
+ | :--- | :--- | :--- | :--- |
78
+ | **Single Question** | The model only needs to check if the final result matches. | Format as a single sub-question (Sub-question 1) worth exactly 10 points.<br><br>Template:<br>`小问1:{Answer},总分10分,无需关注推理过程,最终答案正确即可` | **Raw Answer:** "C"<br>**Input Answer:** `小问1:C,总分10分,无需关注推理过程,最终答案正确即可` |
79
+ | **Multiple Question** | The model needs to grade specific checkpoints. | Break down the answer into numbered sub-steps with assigned points (summing to exactly 10).<br><br>Template:<br>`1. {Sub-Answer A} ({X} points); 2. {Sub-Answer B} ({Y} points).` | **Raw Answer:** "5 apples, 6 bananas"<br>**Input Answer:** `1. 5 apples (4 points); 2. 6 bananas (6 points).` |
80
+
81
+
82
+ ## 📜 Citation
83
+
84
+ If you find this model or the UNO-Bench useful for your research, please cite our paper:
85
+
86
+ ```bibtex
87
+ @misc{chen2025unobench,
88
+ title={UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models},
89
+ author={Chen Chen and ZeYang Hu and Fengjiao Chen and Liya Ma and Jiaxing Liu and Xiaoyu Li and Ziwen Wang and Xuezhi Cao and Xunliang Cai},
90
+ year={2025},
91
+ eprint={2510.18915},
92
+ archivePrefix={arXiv},
93
+ primaryClass={cs.CL},
94
+ url={https://arxiv.org/abs/2510.18915},
95
+ }
96
+ ```
97
+
98
+ ---
99
+
100
+ **Disclaimer:** This model is based on Qwen3-14B. Please strictly follow the license and usage policy of the original Qwen model series.
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9891956782713085,
3
+ "eval_loss": 0.15651731193065643,
4
+ "eval_runtime": 8.6007,
5
+ "eval_samples_per_second": 15.696,
6
+ "eval_steps_per_second": 1.977,
7
+ "total_flos": 170865984536576.0,
8
+ "train_loss": 0.1184023514103431,
9
+ "train_runtime": 9824.5273,
10
+ "train_samples_per_second": 4.068,
11
+ "train_steps_per_second": 0.064
12
+ }
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "eos_token_id": 151645,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 17408,
14
+ "max_position_embeddings": 40960,
15
+ "max_window_layers": 40,
16
+ "model_type": "qwen3",
17
+ "num_attention_heads": 40,
18
+ "num_hidden_layers": 40,
19
+ "num_key_value_heads": 8,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 1000000,
23
+ "sliding_window": null,
24
+ "tie_word_embeddings": false,
25
+ "torch_dtype": "bfloat16",
26
+ "transformers_version": "4.51.0",
27
+ "use_cache": false,
28
+ "use_sliding_window": false,
29
+ "vocab_size": 151936
30
+ }
eval_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9891956782713085,
3
+ "eval_loss": 0.15651731193065643,
4
+ "eval_runtime": 8.6007,
5
+ "eval_samples_per_second": 15.696,
6
+ "eval_steps_per_second": 1.977
7
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.6,
10
+ "top_k": 20,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.51.0"
13
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bdb25da8e44943b0d0c4ae36ef642823a85cd73e88d837d3741ef0ada03af74f
3
+ size 4984780784
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f10f2ce02ecbff623315b99527ee701752f22de57d8b17402b2ec7eec5e92bb
3
+ size 4980892048
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a82a8b6e9fd7bca4e5a160fed235c9d0c1a51e1b130e25144291d3fcc67971de
3
+ size 4928485104
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c3d62883a9ba5cae605192a27066ce27d6b1dca4f30a9aae1ba3ed02f9e8482
3
+ size 4980892112
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85fb9f1a718b34a7a08bd56e5862f27fc7e4956128a03ece1b04dd2e1bd82f2e
3
+ size 4928485104
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df79aad074fc95e9706491ababe75f872d20e8b0afff2d1d6a56bdf955a3f9f6
3
+ size 4733130504
model.safetensors.index.json ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 29536614400
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00006-of-00006.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00006.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
13
+ "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00006.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
16
+ "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00006.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
18
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
19
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
20
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
21
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
22
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
23
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
24
+ "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00006.safetensors",
25
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
26
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
27
+ "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00006.safetensors",
28
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
29
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
30
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00006.safetensors",
31
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
32
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
33
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
34
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
35
+ "model.layers.10.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
36
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
37
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
38
+ "model.layers.10.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
39
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
40
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
41
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00006.safetensors",
42
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
43
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
44
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
45
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
46
+ "model.layers.11.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
47
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
48
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
49
+ "model.layers.11.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
50
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
51
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
52
+ "model.layers.12.input_layernorm.weight": "model-00003-of-00006.safetensors",
53
+ "model.layers.12.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
54
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
55
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
56
+ "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
57
+ "model.layers.12.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
58
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
59
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
60
+ "model.layers.12.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
61
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
62
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
63
+ "model.layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
64
+ "model.layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
65
+ "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
66
+ "model.layers.13.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
67
+ "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
68
+ "model.layers.13.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
69
+ "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
70
+ "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
71
+ "model.layers.13.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
72
+ "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
73
+ "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
74
+ "model.layers.14.input_layernorm.weight": "model-00003-of-00006.safetensors",
75
+ "model.layers.14.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
76
+ "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
77
+ "model.layers.14.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
78
+ "model.layers.14.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
79
+ "model.layers.14.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
80
+ "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
81
+ "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
82
+ "model.layers.14.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
83
+ "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
84
+ "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
85
+ "model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
86
+ "model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
87
+ "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
88
+ "model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
89
+ "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
90
+ "model.layers.15.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
91
+ "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
92
+ "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
93
+ "model.layers.15.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
94
+ "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
95
+ "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
96
+ "model.layers.16.input_layernorm.weight": "model-00003-of-00006.safetensors",
97
+ "model.layers.16.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
98
+ "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
99
+ "model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
100
+ "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
101
+ "model.layers.16.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
102
+ "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
103
+ "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
104
+ "model.layers.16.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
105
+ "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
106
+ "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
107
+ "model.layers.17.input_layernorm.weight": "model-00003-of-00006.safetensors",
108
+ "model.layers.17.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
109
+ "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
110
+ "model.layers.17.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
111
+ "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
112
+ "model.layers.17.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
113
+ "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
114
+ "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
115
+ "model.layers.17.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
116
+ "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
117
+ "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
118
+ "model.layers.18.input_layernorm.weight": "model-00003-of-00006.safetensors",
119
+ "model.layers.18.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
120
+ "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
121
+ "model.layers.18.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
122
+ "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
123
+ "model.layers.18.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
124
+ "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
125
+ "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
126
+ "model.layers.18.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
127
+ "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
128
+ "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
129
+ "model.layers.19.input_layernorm.weight": "model-00003-of-00006.safetensors",
130
+ "model.layers.19.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
131
+ "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
132
+ "model.layers.19.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
133
+ "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
134
+ "model.layers.19.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
135
+ "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
136
+ "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
137
+ "model.layers.19.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
138
+ "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
139
+ "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
140
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
141
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
142
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
143
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
144
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
145
+ "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00006.safetensors",
146
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
147
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
148
+ "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00006.safetensors",
149
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
150
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
151
+ "model.layers.20.input_layernorm.weight": "model-00004-of-00006.safetensors",
152
+ "model.layers.20.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
153
+ "model.layers.20.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
154
+ "model.layers.20.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
155
+ "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
156
+ "model.layers.20.self_attn.k_norm.weight": "model-00003-of-00006.safetensors",
157
+ "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
158
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
159
+ "model.layers.20.self_attn.q_norm.weight": "model-00003-of-00006.safetensors",
160
+ "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
161
+ "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
162
+ "model.layers.21.input_layernorm.weight": "model-00004-of-00006.safetensors",
163
+ "model.layers.21.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
164
+ "model.layers.21.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
165
+ "model.layers.21.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
166
+ "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
167
+ "model.layers.21.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
168
+ "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
169
+ "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
170
+ "model.layers.21.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
171
+ "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
172
+ "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
173
+ "model.layers.22.input_layernorm.weight": "model-00004-of-00006.safetensors",
174
+ "model.layers.22.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
175
+ "model.layers.22.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
176
+ "model.layers.22.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
177
+ "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
178
+ "model.layers.22.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
179
+ "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
180
+ "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
181
+ "model.layers.22.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
182
+ "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
183
+ "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
184
+ "model.layers.23.input_layernorm.weight": "model-00004-of-00006.safetensors",
185
+ "model.layers.23.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
186
+ "model.layers.23.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
187
+ "model.layers.23.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
188
+ "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
189
+ "model.layers.23.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
190
+ "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
191
+ "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
192
+ "model.layers.23.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
193
+ "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
194
+ "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
195
+ "model.layers.24.input_layernorm.weight": "model-00004-of-00006.safetensors",
196
+ "model.layers.24.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
197
+ "model.layers.24.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
198
+ "model.layers.24.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
199
+ "model.layers.24.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
200
+ "model.layers.24.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
201
+ "model.layers.24.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
202
+ "model.layers.24.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
203
+ "model.layers.24.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
204
+ "model.layers.24.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
205
+ "model.layers.24.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
206
+ "model.layers.25.input_layernorm.weight": "model-00004-of-00006.safetensors",
207
+ "model.layers.25.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
208
+ "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
209
+ "model.layers.25.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
210
+ "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
211
+ "model.layers.25.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
212
+ "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
213
+ "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
214
+ "model.layers.25.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
215
+ "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
216
+ "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
217
+ "model.layers.26.input_layernorm.weight": "model-00004-of-00006.safetensors",
218
+ "model.layers.26.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
219
+ "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
220
+ "model.layers.26.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
221
+ "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
222
+ "model.layers.26.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
223
+ "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
224
+ "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
225
+ "model.layers.26.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
226
+ "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
227
+ "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
228
+ "model.layers.27.input_layernorm.weight": "model-00005-of-00006.safetensors",
229
+ "model.layers.27.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
230
+ "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
231
+ "model.layers.27.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
232
+ "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
233
+ "model.layers.27.self_attn.k_norm.weight": "model-00004-of-00006.safetensors",
234
+ "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
235
+ "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
236
+ "model.layers.27.self_attn.q_norm.weight": "model-00004-of-00006.safetensors",
237
+ "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
238
+ "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
239
+ "model.layers.28.input_layernorm.weight": "model-00005-of-00006.safetensors",
240
+ "model.layers.28.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
241
+ "model.layers.28.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
242
+ "model.layers.28.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
243
+ "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
244
+ "model.layers.28.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
245
+ "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
246
+ "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
247
+ "model.layers.28.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
248
+ "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
249
+ "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
250
+ "model.layers.29.input_layernorm.weight": "model-00005-of-00006.safetensors",
251
+ "model.layers.29.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
252
+ "model.layers.29.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
253
+ "model.layers.29.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
254
+ "model.layers.29.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
255
+ "model.layers.29.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
256
+ "model.layers.29.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
257
+ "model.layers.29.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
258
+ "model.layers.29.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
259
+ "model.layers.29.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
260
+ "model.layers.29.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
261
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
262
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
263
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
264
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
265
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
266
+ "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00006.safetensors",
267
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
268
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
269
+ "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00006.safetensors",
270
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
271
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
272
+ "model.layers.30.input_layernorm.weight": "model-00005-of-00006.safetensors",
273
+ "model.layers.30.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
274
+ "model.layers.30.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
275
+ "model.layers.30.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
276
+ "model.layers.30.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
277
+ "model.layers.30.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
278
+ "model.layers.30.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
279
+ "model.layers.30.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
280
+ "model.layers.30.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
281
+ "model.layers.30.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
282
+ "model.layers.30.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
283
+ "model.layers.31.input_layernorm.weight": "model-00005-of-00006.safetensors",
284
+ "model.layers.31.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
285
+ "model.layers.31.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
286
+ "model.layers.31.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
287
+ "model.layers.31.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
288
+ "model.layers.31.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
289
+ "model.layers.31.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
290
+ "model.layers.31.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
291
+ "model.layers.31.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
292
+ "model.layers.31.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
293
+ "model.layers.31.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
294
+ "model.layers.32.input_layernorm.weight": "model-00005-of-00006.safetensors",
295
+ "model.layers.32.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
296
+ "model.layers.32.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
297
+ "model.layers.32.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
298
+ "model.layers.32.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
299
+ "model.layers.32.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
300
+ "model.layers.32.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
301
+ "model.layers.32.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
302
+ "model.layers.32.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
303
+ "model.layers.32.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
304
+ "model.layers.32.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
305
+ "model.layers.33.input_layernorm.weight": "model-00005-of-00006.safetensors",
306
+ "model.layers.33.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
307
+ "model.layers.33.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
308
+ "model.layers.33.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
309
+ "model.layers.33.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
310
+ "model.layers.33.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
311
+ "model.layers.33.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
312
+ "model.layers.33.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
313
+ "model.layers.33.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
314
+ "model.layers.33.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
315
+ "model.layers.33.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
316
+ "model.layers.34.input_layernorm.weight": "model-00005-of-00006.safetensors",
317
+ "model.layers.34.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
318
+ "model.layers.34.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
319
+ "model.layers.34.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
320
+ "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
321
+ "model.layers.34.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
322
+ "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
323
+ "model.layers.34.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
324
+ "model.layers.34.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
325
+ "model.layers.34.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
326
+ "model.layers.34.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
327
+ "model.layers.35.input_layernorm.weight": "model-00006-of-00006.safetensors",
328
+ "model.layers.35.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
329
+ "model.layers.35.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
330
+ "model.layers.35.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
331
+ "model.layers.35.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
332
+ "model.layers.35.self_attn.k_norm.weight": "model-00005-of-00006.safetensors",
333
+ "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
334
+ "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
335
+ "model.layers.35.self_attn.q_norm.weight": "model-00005-of-00006.safetensors",
336
+ "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
337
+ "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
338
+ "model.layers.36.input_layernorm.weight": "model-00006-of-00006.safetensors",
339
+ "model.layers.36.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
340
+ "model.layers.36.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
341
+ "model.layers.36.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
342
+ "model.layers.36.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
343
+ "model.layers.36.self_attn.k_norm.weight": "model-00006-of-00006.safetensors",
344
+ "model.layers.36.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
345
+ "model.layers.36.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
346
+ "model.layers.36.self_attn.q_norm.weight": "model-00006-of-00006.safetensors",
347
+ "model.layers.36.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
348
+ "model.layers.36.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
349
+ "model.layers.37.input_layernorm.weight": "model-00006-of-00006.safetensors",
350
+ "model.layers.37.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
351
+ "model.layers.37.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
352
+ "model.layers.37.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
353
+ "model.layers.37.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
354
+ "model.layers.37.self_attn.k_norm.weight": "model-00006-of-00006.safetensors",
355
+ "model.layers.37.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
356
+ "model.layers.37.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
357
+ "model.layers.37.self_attn.q_norm.weight": "model-00006-of-00006.safetensors",
358
+ "model.layers.37.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
359
+ "model.layers.37.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
360
+ "model.layers.38.input_layernorm.weight": "model-00006-of-00006.safetensors",
361
+ "model.layers.38.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
362
+ "model.layers.38.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
363
+ "model.layers.38.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
364
+ "model.layers.38.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
365
+ "model.layers.38.self_attn.k_norm.weight": "model-00006-of-00006.safetensors",
366
+ "model.layers.38.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
367
+ "model.layers.38.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
368
+ "model.layers.38.self_attn.q_norm.weight": "model-00006-of-00006.safetensors",
369
+ "model.layers.38.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
370
+ "model.layers.38.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
371
+ "model.layers.39.input_layernorm.weight": "model-00006-of-00006.safetensors",
372
+ "model.layers.39.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
373
+ "model.layers.39.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
374
+ "model.layers.39.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
375
+ "model.layers.39.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
376
+ "model.layers.39.self_attn.k_norm.weight": "model-00006-of-00006.safetensors",
377
+ "model.layers.39.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
378
+ "model.layers.39.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
379
+ "model.layers.39.self_attn.q_norm.weight": "model-00006-of-00006.safetensors",
380
+ "model.layers.39.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
381
+ "model.layers.39.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
382
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
383
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
384
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
385
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
386
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
387
+ "model.layers.4.self_attn.k_norm.weight": "model-00001-of-00006.safetensors",
388
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
389
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
390
+ "model.layers.4.self_attn.q_norm.weight": "model-00001-of-00006.safetensors",
391
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
392
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
393
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00006.safetensors",
394
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
395
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
396
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
397
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
398
+ "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00006.safetensors",
399
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
400
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
401
+ "model.layers.5.self_attn.q_norm.weight": "model-00001-of-00006.safetensors",
402
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
403
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
404
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
405
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
406
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
407
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
408
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
409
+ "model.layers.6.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
410
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
411
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
412
+ "model.layers.6.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
413
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
414
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
415
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
416
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
417
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
418
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
419
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
420
+ "model.layers.7.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
421
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
422
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
423
+ "model.layers.7.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
424
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
425
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
426
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
427
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
428
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
429
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
430
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
431
+ "model.layers.8.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
432
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
433
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
434
+ "model.layers.8.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
435
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
436
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
437
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
438
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
439
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
440
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
441
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
442
+ "model.layers.9.self_attn.k_norm.weight": "model-00002-of-00006.safetensors",
443
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
444
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
445
+ "model.layers.9.self_attn.q_norm.weight": "model-00002-of-00006.safetensors",
446
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
447
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
448
+ "model.norm.weight": "model-00006-of-00006.safetensors"
449
+ }
450
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
test_scorer_hf.py ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import argparse
3
+ from tqdm import tqdm
4
+ from transformers import AutoModelForCausalLM, AutoTokenizer
5
+
6
+ def extract_last_boxed(text):
7
+ try:
8
+ pattern = r'<score>([\d.]+)</score>'
9
+ matches = re.findall(pattern, text)
10
+ if matches:
11
+ return float(matches[-1])
12
+ else:
13
+ return 0.0
14
+ except Exception as e:
15
+ print(f"Error extracting boxed content: {e}")
16
+ return 0.0
17
+
18
+ def parse_from_score_model(response: str, scale_factor=10) -> float:
19
+ score = extract_last_boxed(response)
20
+ score = score / scale_factor
21
+ return score
22
+
23
+ def load_model(model_name: str) -> AutoModelForCausalLM:
24
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
25
+ model = AutoModelForCausalLM.from_pretrained(
26
+ model_name,
27
+ torch_dtype="auto",
28
+ device_map="auto"
29
+ )
30
+ return tokenizer, model
31
+
32
+ def generate(model, tokenizer, prompt: str) -> str:
33
+ messages = [
34
+ {
35
+ "role": "system",
36
+ "content": "You are a helpful assistant."
37
+ },
38
+ {
39
+ "role": "user",
40
+ "content": prompt
41
+ }
42
+ ]
43
+ text = tokenizer.apply_chat_template(
44
+ messages,
45
+ tokenize=False,
46
+ add_generation_prompt=True,
47
+ )
48
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
49
+
50
+ # conduct text completion
51
+ generated_ids = model.generate(
52
+ **model_inputs,
53
+ max_new_tokens=16384,
54
+ do_sample=False
55
+ )
56
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
57
+
58
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
59
+ return content
60
+
61
+ def remove_thought_block(text: str) -> str:
62
+ pattern = r"^(<think>.*?</think>|.*?)"
63
+ match = re.match(pattern, text, flags=re.DOTALL)
64
+ if match:
65
+ end_of_match = match.end()
66
+ return text[end_of_match:].lstrip()
67
+ return text
68
+
69
+ def process_score_prompt(question, reference, response):
70
+ promt_template = """请先通读问题信息,然后基于参考答案对模型回复的结果进行正确性打分。每道题可能包含多个小问,每个小问都已给出了相应的参考答案和分值,请逐小问校验模型回复是否正确,正确得对应分值,错误或漏答得0分,累计计分,有如下要求。
71
+
72
+ ---
73
+
74
+ ### 要求1:信息梳理
75
+
76
+ - 梳理出如下信息
77
+ - 问题内容
78
+ - 参考答案(可适度完善表达,但不改变核心内容)
79
+ - 模型回复(需要将模型回复中的指代关系与参考答案对齐)
80
+ - 分值
81
+
82
+ ### 要求2:判断题型
83
+
84
+ - 明确该小问属于以下哪种题型之一,并基于该类型的打分标准进行打分,需要给出详细的比对过程。
85
+ - **数值型**,要求模型回复与标准答案的数值完全相同,不允许有误差。例,`问题:北京奥运会是哪一年?参考答案:2008,模型回复:2004,打分结果:错误。`
86
+ - **枚举型**,要求模型回复列举出参考答案的全部对象,缺一不可、错一不可,允许同义词等语义相近的表达,题中有顺序要求则必须按顺序枚举。例,`图中出现了哪些动物?参考答案:大熊猫、河马、长颈鹿,模型回复:河马、小熊猫、长颈鹿,打分结果:错误。 `注:“/”表示“或”,如,XXA/XXB,表示回答出任意一项即可。
87
+ - **选择题**,要求模型回复与参考答案相同的选项或选项内容。例,`问题:李白是哪个朝代的诗人?A. 唐朝 B. 宋朝 C. 元朝,模型回复:李白是唐朝诗人,打分结果:正确。`
88
+ - **判断题**,要求模型回复与参考答案的判断一致。例,`问题:图中鼠标是否放在了笔记本电脑左侧?参考答案:是,模型回复:图中鼠标在笔记本电脑的左侧。打分结果:正确。`
89
+ - **简答题**,要求模型回复包括与参考答案语义一致的短语或表达,允许表达方式不同。例,`问题:视频中最后放入锅中的食材是什么?参考答案:洋葱,模型回复:胡萝卜。打分结果:错误。`
90
+ - **论述题**,要求模型回复包含参考答案的核心观点。例,`问题:请简要论述为什么要保护生物多样性。参考答案:维持生态平衡,模型回复:保护生物多样性能够让生态系统保持稳定,促进人类社会的可持续发展。打分结果:正确。`
91
+
92
+ ### 要求3:打分标准
93
+
94
+ - **完全正确**:得满分。
95
+ - **错误或漏答**:得0分。
96
+ - 如模型回复与参考答案大意相同但细节略有差别,且非核心内容,视为正确,具体参考参考答案的详细要求。
97
+ - 若模型回复未直接给出答案,需主动归纳总结结论,只关注结论是否一致。
98
+ - 每小问独立打分,前序错误不影响后续小问的结果。
99
+
100
+ ### 要求4:输出格式
101
+
102
+ - 逐小问列出得分说明。
103
+ - 所有小问得分相加,在<score></score>中给出总分,例如:<score>5</score>
104
+
105
+ ---
106
+
107
+ ## 问题���息
108
+ {{question}}
109
+ ## 参考答案
110
+ {{reference}}
111
+ ## 模型回复
112
+ {{response}}
113
+ ## 逐小问打分"""
114
+
115
+ prompt = promt_template.replace("{{question}}", remove_thought_block(question.strip()))
116
+ prompt = prompt.replace("{{reference}}", reference)
117
+ prompt = prompt.replace("{{response}}", response)
118
+ return prompt
119
+
120
+
121
+ if __name__ == "__main__":
122
+
123
+ parser = argparse.ArgumentParser(description="Run evaluation on a given model and dataset.")
124
+ parser.add_argument("--model-name", type=str, default="", help="The model name or path.")
125
+
126
+ args = parser.parse_args()
127
+ tokenizer, model = load_model(args.model_name)
128
+
129
+ test_cases = [
130
+ # Multiple-choice question
131
+ {
132
+ "question_type": "multiple-choice",
133
+ "question": "Which one is the best depiction of this image?\nA. A beautiful woman wearing a hat is holding a rose in her right hand and is opening her mouth big to eat it.\nB. A young woman is wearing a hat and her right hand is pierced by a bouquet of pink roses, and she opens her mouth to express pain.\nC. A young girl wearing brown clothes is holding a blooming pink flower in her right hand, with her mouth slightly open.\nD. A young girl wearing brown clothes is holding a blooming magic wand in her right hand, with her mouth slightly open.",
134
+ "answer": "C",
135
+ "response": "Well, if we're talking about the best depiction of this image, I'd say it's C. The young girl in brown clothes holding a blooming pink flower in her right hand with her mouth slightly open seems to match the image the most. What do you think? Do you agree with that?",
136
+ "target_score": 1
137
+ },
138
+ # Yes/No question
139
+ {
140
+ "question_type": "yes/no",
141
+ "question": "The following images show two graphs. Each graph is represented in a separate image. Are these graphs isomorphic?Answer Yes or No.",
142
+ "answer": "Yes",
143
+ "response": "No. If you have any other questions about these graphs or anything else, feel free to ask!",
144
+ "target_score": 0
145
+ },
146
+ # Open-ended question
147
+ {
148
+ "question_type": "open-ended",
149
+ "question": "How many seasons did 90210 have?",
150
+ "answer": "Ten",
151
+ "response": "I'm not sure which \"90210\" you're referring to. There are different shows or things with that name. If it's the TV show \"90210\", it had 7 seasons. But if it's something else, you might need to give me a bit more context.",
152
+ "target_score": 0
153
+ },
154
+ # Multiple-question answer
155
+ {
156
+ # If the question type is multi-question, you must specify the answer in detail.
157
+ "question_type": "multiple-question",
158
+ "question": "David is playing an escape room game. In one of the levels, he needs to find the key to the room. The game rules state that he must locate the box containing the key among a row of identical-looking boxes, and correctly enter the box's password to obtain the key to escape. It is known that the box with the key meets the following conditions: it is not at either end; it is not in the very middle; the number of boxes on both its left and right sides is even. In addition, each box has a three-digit password, with each digit being one of the numbers from 0 to 9. The arrangement of the boxes and the password puzzle are hidden in image <image_1>. The image puzzle provides two decryption examples, and you need to use these examples to decode which digits the two question marks at the bottom of the image represent. Among all the numbers given in the provided examples and obtained after decoding, the smallest number is interference information and should be excluded. Partial clues about the box containing the key and its password are hidden in audio <audio_1>. Based on the above information, and combining the image and audio clues, answer:\n1. Which box (counting from left to right) contains the key?\n2. What is the correct password for the box containing the key?",
159
+ "answer": "1. The 5th box from the left (4 points);\n2. The password is 569 (6 points).",
160
+ "response": "Hmm… let me think. According to the hint in the audio, the key to decryption lies in the Roman numerals hidden in the words, and these numbers are all no greater than 10. From the image, there are six boxes. Based on the rules, the key is neither at the ends nor in the middle, so it should be in the 5th box from the left. As for the password, arranging in ascending order, it should be Victory = 2, give = 3. So the password for the box with the key is 23. If you have any other ideas or questions, feel free to let me know.",
161
+ "target_score": 0.4
162
+ }
163
+ ]
164
+
165
+ prompts = []
166
+ for case in test_cases:
167
+ answer = case["answer"]
168
+ if case["question_type"] != "multiple-question":
169
+ # The Chinese rule will be better because the scorer model is trained in Chinese.
170
+ answer = f"小问1:{answer},总分10分,无需关注推理过程,最终答案正确即可"
171
+ question = case["question"]
172
+ response = remove_thought_block(case["response"])
173
+ prompt = process_score_prompt(question=question, reference=answer, response=response)
174
+ prompts.append(prompt)
175
+
176
+ score_responses = []
177
+ for prompt in tqdm(prompts):
178
+ score_response = generate(model, tokenizer, prompt)
179
+ score_responses.append(score_response)
180
+
181
+ pass_cnt = 0
182
+ for score_response, case in zip(score_responses, test_cases):
183
+ print("="*32)
184
+ score = parse_from_score_model(score_response)
185
+ for key,value in case.items():
186
+ print(f"{key}: {value}")
187
+ print("Score response:\n", score_response)
188
+ print(f"Score: {score}, Target Score: {case['target_score']}")
189
+
190
+ if score == case["target_score"]:
191
+ pass_cnt += 1
192
+ print("*"*32)
193
+ print(f"Pass: {pass_cnt}/{len(test_cases)}")
194
+
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|im_end|>",
233
+ "errors": "replace",
234
+ "extra_special_tokens": {},
235
+ "model_max_length": 131072,
236
+ "pad_token": "<|endoftext|>",
237
+ "padding_side": "right",
238
+ "split_special_tokens": false,
239
+ "tokenizer_class": "Qwen2Tokenizer",
240
+ "unk_token": null
241
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9891956782713085,
3
+ "total_flos": 170865984536576.0,
4
+ "train_loss": 0.1184023514103431,
5
+ "train_runtime": 9824.5273,
6
+ "train_samples_per_second": 4.068,
7
+ "train_steps_per_second": 0.064
8
+ }
trainer_log.jsonl ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"current_steps": 10, "total_steps": 624, "loss": 0.4239, "lr": 1.4285714285714286e-06, "epoch": 0.04801920768307323, "percentage": 1.6, "elapsed_time": "0:02:33", "remaining_time": "2:37:16"}
2
+ {"current_steps": 20, "total_steps": 624, "loss": 0.3052, "lr": 3.015873015873016e-06, "epoch": 0.09603841536614646, "percentage": 3.21, "elapsed_time": "0:04:55", "remaining_time": "2:28:51"}
3
+ {"current_steps": 30, "total_steps": 624, "loss": 0.2483, "lr": 4.603174603174604e-06, "epoch": 0.14405762304921968, "percentage": 4.81, "elapsed_time": "0:07:20", "remaining_time": "2:25:26"}
4
+ {"current_steps": 40, "total_steps": 624, "loss": 0.2312, "lr": 6.1904761904761914e-06, "epoch": 0.19207683073229292, "percentage": 6.41, "elapsed_time": "0:09:58", "remaining_time": "2:25:35"}
5
+ {"current_steps": 50, "total_steps": 624, "loss": 0.1988, "lr": 7.77777777777778e-06, "epoch": 0.24009603841536614, "percentage": 8.01, "elapsed_time": "0:12:33", "remaining_time": "2:24:11"}
6
+ {"current_steps": 50, "total_steps": 624, "eval_loss": 0.2022361308336258, "epoch": 0.24009603841536614, "percentage": 8.01, "elapsed_time": "0:12:42", "remaining_time": "2:25:51"}
7
+ {"current_steps": 60, "total_steps": 624, "loss": 0.1855, "lr": 9.365079365079366e-06, "epoch": 0.28811524609843936, "percentage": 9.62, "elapsed_time": "0:15:20", "remaining_time": "2:24:10"}
8
+ {"current_steps": 70, "total_steps": 624, "loss": 0.1874, "lr": 9.99717787871887e-06, "epoch": 0.33613445378151263, "percentage": 11.22, "elapsed_time": "0:17:42", "remaining_time": "2:20:12"}
9
+ {"current_steps": 80, "total_steps": 624, "loss": 0.1732, "lr": 9.979943117513265e-06, "epoch": 0.38415366146458585, "percentage": 12.82, "elapsed_time": "0:20:03", "remaining_time": "2:16:23"}
10
+ {"current_steps": 90, "total_steps": 624, "loss": 0.1798, "lr": 9.947095408534483e-06, "epoch": 0.43217286914765907, "percentage": 14.42, "elapsed_time": "0:22:31", "remaining_time": "2:13:38"}
11
+ {"current_steps": 100, "total_steps": 624, "loss": 0.1671, "lr": 9.898737734799134e-06, "epoch": 0.4801920768307323, "percentage": 16.03, "elapsed_time": "0:24:57", "remaining_time": "2:10:49"}
12
+ {"current_steps": 100, "total_steps": 624, "eval_loss": 0.17217175662517548, "epoch": 0.4801920768307323, "percentage": 16.03, "elapsed_time": "0:25:06", "remaining_time": "2:11:35"}
13
+ {"current_steps": 110, "total_steps": 624, "loss": 0.1672, "lr": 9.835021705636201e-06, "epoch": 0.5282112845138055, "percentage": 17.63, "elapsed_time": "0:27:39", "remaining_time": "2:09:12"}
14
+ {"current_steps": 120, "total_steps": 624, "loss": 0.1661, "lr": 9.756147081366673e-06, "epoch": 0.5762304921968787, "percentage": 19.23, "elapsed_time": "0:30:05", "remaining_time": "2:06:23"}
15
+ {"current_steps": 130, "total_steps": 624, "loss": 0.1678, "lr": 9.66236114702178e-06, "epoch": 0.6242496998799519, "percentage": 20.83, "elapsed_time": "0:32:29", "remaining_time": "2:03:27"}
16
+ {"current_steps": 140, "total_steps": 624, "loss": 0.1604, "lr": 9.55395793706341e-06, "epoch": 0.6722689075630253, "percentage": 22.44, "elapsed_time": "0:34:53", "remaining_time": "2:00:36"}
17
+ {"current_steps": 150, "total_steps": 624, "loss": 0.1581, "lr": 9.43127731353729e-06, "epoch": 0.7202881152460985, "percentage": 24.04, "elapsed_time": "0:37:08", "remaining_time": "1:57:23"}
18
+ {"current_steps": 150, "total_steps": 624, "eval_loss": 0.15990422666072845, "epoch": 0.7202881152460985, "percentage": 24.04, "elapsed_time": "0:37:17", "remaining_time": "1:57:51"}
19
+ {"current_steps": 160, "total_steps": 624, "loss": 0.1608, "lr": 9.294703900549096e-06, "epoch": 0.7683073229291717, "percentage": 25.64, "elapsed_time": "0:39:45", "remaining_time": "1:55:18"}
20
+ {"current_steps": 170, "total_steps": 624, "loss": 0.162, "lr": 9.14466587840408e-06, "epoch": 0.8163265306122449, "percentage": 27.24, "elapsed_time": "0:42:05", "remaining_time": "1:52:24"}
21
+ {"current_steps": 180, "total_steps": 624, "loss": 0.1566, "lr": 8.981633641190779e-06, "epoch": 0.8643457382953181, "percentage": 28.85, "elapsed_time": "0:44:37", "remaining_time": "1:50:05"}
22
+ {"current_steps": 190, "total_steps": 624, "loss": 0.1486, "lr": 8.806118322017525e-06, "epoch": 0.9123649459783914, "percentage": 30.45, "elapsed_time": "0:47:01", "remaining_time": "1:47:25"}
23
+ {"current_steps": 200, "total_steps": 624, "loss": 0.1513, "lr": 8.61867019052535e-06, "epoch": 0.9603841536614646, "percentage": 32.05, "elapsed_time": "0:49:25", "remaining_time": "1:44:46"}
24
+ {"current_steps": 200, "total_steps": 624, "eval_loss": 0.1510133445262909, "epoch": 0.9603841536614646, "percentage": 32.05, "elapsed_time": "0:49:33", "remaining_time": "1:45:04"}
25
+ {"current_steps": 210, "total_steps": 624, "loss": 0.1452, "lr": 8.41987692770139e-06, "epoch": 1.0048019207683074, "percentage": 33.65, "elapsed_time": "0:54:20", "remaining_time": "1:47:08"}
26
+ {"current_steps": 220, "total_steps": 624, "loss": 0.1086, "lr": 8.210361783401491e-06, "epoch": 1.0528211284513807, "percentage": 35.26, "elapsed_time": "0:56:50", "remaining_time": "1:44:22"}
27
+ {"current_steps": 230, "total_steps": 624, "loss": 0.1098, "lr": 7.990781622358535e-06, "epoch": 1.1008403361344539, "percentage": 36.86, "elapsed_time": "0:59:14", "remaining_time": "1:41:29"}
28
+ {"current_steps": 240, "total_steps": 624, "loss": 0.1042, "lr": 7.76182486480253e-06, "epoch": 1.148859543817527, "percentage": 38.46, "elapsed_time": "1:01:38", "remaining_time": "1:38:36"}
29
+ {"current_steps": 250, "total_steps": 624, "loss": 0.1104, "lr": 7.524209328148995e-06, "epoch": 1.1968787515006003, "percentage": 40.06, "elapsed_time": "1:04:00", "remaining_time": "1:35:45"}
30
+ {"current_steps": 250, "total_steps": 624, "eval_loss": 0.15508781373500824, "epoch": 1.1968787515006003, "percentage": 40.06, "elapsed_time": "1:04:09", "remaining_time": "1:35:58"}
31
+ {"current_steps": 260, "total_steps": 624, "loss": 0.1024, "lr": 7.278679976522279e-06, "epoch": 1.2448979591836735, "percentage": 41.67, "elapsed_time": "1:06:31", "remaining_time": "1:33:08"}
32
+ {"current_steps": 270, "total_steps": 624, "loss": 0.1067, "lr": 7.026006585169467e-06, "epoch": 1.2929171668667467, "percentage": 43.27, "elapsed_time": "1:08:59", "remaining_time": "1:30:27"}
33
+ {"current_steps": 280, "total_steps": 624, "loss": 0.1106, "lr": 6.766981327087271e-06, "epoch": 1.34093637454982, "percentage": 44.87, "elapsed_time": "1:11:27", "remaining_time": "1:27:47"}
34
+ {"current_steps": 290, "total_steps": 624, "loss": 0.1027, "lr": 6.502416289428282e-06, "epoch": 1.3889555822328932, "percentage": 46.47, "elapsed_time": "1:13:54", "remaining_time": "1:25:07"}
35
+ {"current_steps": 300, "total_steps": 624, "loss": 0.1068, "lr": 6.233140927473033e-06, "epoch": 1.4369747899159664, "percentage": 48.08, "elapsed_time": "1:16:23", "remaining_time": "1:22:30"}
36
+ {"current_steps": 300, "total_steps": 624, "eval_loss": 0.14931099116802216, "epoch": 1.4369747899159664, "percentage": 48.08, "elapsed_time": "1:16:32", "remaining_time": "1:22:39"}
37
+ {"current_steps": 310, "total_steps": 624, "loss": 0.1043, "lr": 5.959999464150101e-06, "epoch": 1.4849939975990396, "percentage": 49.68, "elapsed_time": "1:18:56", "remaining_time": "1:19:57"}
38
+ {"current_steps": 320, "total_steps": 624, "loss": 0.1058, "lr": 5.683848243257181e-06, "epoch": 1.5330132052821128, "percentage": 51.28, "elapsed_time": "1:21:17", "remaining_time": "1:17:13"}
39
+ {"current_steps": 330, "total_steps": 624, "loss": 0.1035, "lr": 5.40555304468122e-06, "epoch": 1.581032412965186, "percentage": 52.88, "elapsed_time": "1:23:38", "remaining_time": "1:14:31"}
40
+ {"current_steps": 340, "total_steps": 624, "loss": 0.1032, "lr": 5.125986370034862e-06, "epoch": 1.6290516206482593, "percentage": 54.49, "elapsed_time": "1:26:02", "remaining_time": "1:11:51"}
41
+ {"current_steps": 350, "total_steps": 624, "loss": 0.1006, "lr": 4.846024707219149e-06, "epoch": 1.6770708283313325, "percentage": 56.09, "elapsed_time": "1:28:25", "remaining_time": "1:09:13"}
42
+ {"current_steps": 350, "total_steps": 624, "eval_loss": 0.14417614042758942, "epoch": 1.6770708283313325, "percentage": 56.09, "elapsed_time": "1:28:33", "remaining_time": "1:09:19"}
43
+ {"current_steps": 360, "total_steps": 624, "loss": 0.1019, "lr": 4.566545782488554e-06, "epoch": 1.725090036014406, "percentage": 57.69, "elapsed_time": "1:31:04", "remaining_time": "1:06:47"}
44
+ {"current_steps": 370, "total_steps": 624, "loss": 0.0976, "lr": 4.2884258086335755e-06, "epoch": 1.773109243697479, "percentage": 59.29, "elapsed_time": "1:33:32", "remaining_time": "1:04:13"}
45
+ {"current_steps": 380, "total_steps": 624, "loss": 0.1003, "lr": 4.012536737908288e-06, "epoch": 1.8211284513805523, "percentage": 60.9, "elapsed_time": "1:35:46", "remaining_time": "1:01:30"}
46
+ {"current_steps": 390, "total_steps": 624, "loss": 0.0991, "lr": 3.7397435283153795e-06, "epoch": 1.8691476590636253, "percentage": 62.5, "elapsed_time": "1:38:08", "remaining_time": "0:58:53"}
47
+ {"current_steps": 400, "total_steps": 624, "loss": 0.1029, "lr": 3.4709014318193298e-06, "epoch": 1.9171668667466988, "percentage": 64.1, "elapsed_time": "1:40:48", "remaining_time": "0:56:27"}
48
+ {"current_steps": 400, "total_steps": 624, "eval_loss": 0.14110355079174042, "epoch": 1.9171668667466988, "percentage": 64.1, "elapsed_time": "1:40:57", "remaining_time": "0:56:32"}
49
+ {"current_steps": 410, "total_steps": 624, "loss": 0.1035, "lr": 3.2068533129896273e-06, "epoch": 1.9651860744297718, "percentage": 65.71, "elapsed_time": "1:46:03", "remaining_time": "0:55:21"}
50
+ {"current_steps": 420, "total_steps": 624, "loss": 0.0912, "lr": 2.948427006480528e-06, "epoch": 2.009603841536615, "percentage": 67.31, "elapsed_time": "1:48:16", "remaining_time": "0:52:35"}
51
+ {"current_steps": 430, "total_steps": 624, "loss": 0.059, "lr": 2.696432721632082e-06, "epoch": 2.057623049219688, "percentage": 68.91, "elapsed_time": "1:50:42", "remaining_time": "0:49:56"}
52
+ {"current_steps": 440, "total_steps": 624, "loss": 0.0567, "lr": 2.4516605023294626e-06, "epoch": 2.1056422569027613, "percentage": 70.51, "elapsed_time": "1:53:10", "remaining_time": "0:47:19"}
53
+ {"current_steps": 450, "total_steps": 624, "loss": 0.0617, "lr": 2.2148777500843125e-06, "epoch": 2.1536614645858343, "percentage": 72.12, "elapsed_time": "1:55:34", "remaining_time": "0:44:41"}
54
+ {"current_steps": 450, "total_steps": 624, "eval_loss": 0.1581123322248459, "epoch": 2.1536614645858343, "percentage": 72.12, "elapsed_time": "1:55:43", "remaining_time": "0:44:44"}
55
+ {"current_steps": 460, "total_steps": 624, "loss": 0.0584, "lr": 1.9868268181037186e-06, "epoch": 2.2016806722689077, "percentage": 73.72, "elapsed_time": "1:58:09", "remaining_time": "0:42:07"}
56
+ {"current_steps": 470, "total_steps": 624, "loss": 0.058, "lr": 1.768222683889757e-06, "epoch": 2.2496998799519807, "percentage": 75.32, "elapsed_time": "2:00:29", "remaining_time": "0:39:28"}
57
+ {"current_steps": 480, "total_steps": 624, "loss": 0.0588, "lr": 1.5597507076664187e-06, "epoch": 2.297719087635054, "percentage": 76.92, "elapsed_time": "2:02:49", "remaining_time": "0:36:50"}
58
+ {"current_steps": 490, "total_steps": 624, "loss": 0.0555, "lr": 1.362064483661617e-06, "epoch": 2.345738295318127, "percentage": 78.53, "elapsed_time": "2:05:16", "remaining_time": "0:34:15"}
59
+ {"current_steps": 500, "total_steps": 624, "loss": 0.0584, "lr": 1.1757837909808628e-06, "epoch": 2.3937575030012006, "percentage": 80.13, "elapsed_time": "2:07:44", "remaining_time": "0:31:40"}
60
+ {"current_steps": 500, "total_steps": 624, "eval_loss": 0.1588136851787567, "epoch": 2.3937575030012006, "percentage": 80.13, "elapsed_time": "2:07:52", "remaining_time": "0:31:42"}
61
+ {"current_steps": 510, "total_steps": 624, "loss": 0.0568, "lr": 1.0014926504969535e-06, "epoch": 2.4417767106842736, "percentage": 81.73, "elapsed_time": "2:10:10", "remaining_time": "0:29:05"}
62
+ {"current_steps": 520, "total_steps": 624, "loss": 0.057, "lr": 8.397374938476594e-07, "epoch": 2.489795918367347, "percentage": 83.33, "elapsed_time": "2:12:39", "remaining_time": "0:26:31"}
63
+ {"current_steps": 530, "total_steps": 624, "loss": 0.0562, "lr": 6.910254502818914e-07, "epoch": 2.53781512605042, "percentage": 84.94, "elapsed_time": "2:15:02", "remaining_time": "0:23:57"}
64
+ {"current_steps": 540, "total_steps": 624, "loss": 0.0571, "lr": 5.558227567253832e-07, "epoch": 2.5858343337334935, "percentage": 86.54, "elapsed_time": "2:17:39", "remaining_time": "0:21:24"}
65
+ {"current_steps": 550, "total_steps": 624, "loss": 0.0585, "lr": 4.3455329605058436e-07, "epoch": 2.6338535414165665, "percentage": 88.14, "elapsed_time": "2:20:14", "remaining_time": "0:18:52"}
66
+ {"current_steps": 550, "total_steps": 624, "eval_loss": 0.1571720838546753, "epoch": 2.6338535414165665, "percentage": 88.14, "elapsed_time": "2:20:23", "remaining_time": "0:18:53"}
67
+ {"current_steps": 560, "total_steps": 624, "loss": 0.0557, "lr": 3.275972681335421e-07, "epoch": 2.68187274909964, "percentage": 89.74, "elapsed_time": "2:22:48", "remaining_time": "0:16:19"}
68
+ {"current_steps": 570, "total_steps": 624, "loss": 0.0551, "lr": 2.3528999786421758e-07, "epoch": 2.729891956782713, "percentage": 91.35, "elapsed_time": "2:25:13", "remaining_time": "0:13:45"}
69
+ {"current_steps": 580, "total_steps": 624, "loss": 0.0578, "lr": 1.5792088384733174e-07, "epoch": 2.7779111644657863, "percentage": 92.95, "elapsed_time": "2:27:44", "remaining_time": "0:11:12"}
70
+ {"current_steps": 590, "total_steps": 624, "loss": 0.0571, "lr": 9.573249108973281e-08, "epoch": 2.82593037214886, "percentage": 94.55, "elapsed_time": "2:30:00", "remaining_time": "0:08:38"}
71
+ {"current_steps": 600, "total_steps": 624, "loss": 0.0552, "lr": 4.891979051886153e-08, "epoch": 2.8739495798319328, "percentage": 96.15, "elapsed_time": "2:32:29", "remaining_time": "0:06:05"}
72
+ {"current_steps": 600, "total_steps": 624, "eval_loss": 0.15641489624977112, "epoch": 2.8739495798319328, "percentage": 96.15, "elapsed_time": "2:32:38", "remaining_time": "0:06:06"}
73
+ {"current_steps": 610, "total_steps": 624, "loss": 0.058, "lr": 1.762954771655001e-08, "epoch": 2.9219687875150058, "percentage": 97.76, "elapsed_time": "2:37:50", "remaining_time": "0:03:37"}
74
+ {"current_steps": 620, "total_steps": 624, "loss": 0.0548, "lr": 1.959862784577937e-09, "epoch": 2.969987995198079, "percentage": 99.36, "elapsed_time": "2:40:11", "remaining_time": "0:01:02"}
75
+ {"current_steps": 624, "total_steps": 624, "epoch": 2.9891956782713085, "percentage": 100.0, "elapsed_time": "2:43:44", "remaining_time": "0:00:00"}
trainer_state.json ADDED
@@ -0,0 +1,573 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.9891956782713085,
6
+ "eval_steps": 50,
7
+ "global_step": 624,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.04801920768307323,
14
+ "grad_norm": 3.922592356078373,
15
+ "learning_rate": 1.4285714285714286e-06,
16
+ "loss": 0.4239,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.09603841536614646,
21
+ "grad_norm": 1.048139141700484,
22
+ "learning_rate": 3.015873015873016e-06,
23
+ "loss": 0.3052,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.14405762304921968,
28
+ "grad_norm": 0.8538085575650297,
29
+ "learning_rate": 4.603174603174604e-06,
30
+ "loss": 0.2483,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.19207683073229292,
35
+ "grad_norm": 0.7208408522041903,
36
+ "learning_rate": 6.1904761904761914e-06,
37
+ "loss": 0.2312,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.24009603841536614,
42
+ "grad_norm": 0.6048748741273176,
43
+ "learning_rate": 7.77777777777778e-06,
44
+ "loss": 0.1988,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.24009603841536614,
49
+ "eval_loss": 0.2022361308336258,
50
+ "eval_runtime": 8.7258,
51
+ "eval_samples_per_second": 15.471,
52
+ "eval_steps_per_second": 1.948,
53
+ "step": 50
54
+ },
55
+ {
56
+ "epoch": 0.28811524609843936,
57
+ "grad_norm": 0.6463317852210085,
58
+ "learning_rate": 9.365079365079366e-06,
59
+ "loss": 0.1855,
60
+ "step": 60
61
+ },
62
+ {
63
+ "epoch": 0.33613445378151263,
64
+ "grad_norm": 0.7181148124716302,
65
+ "learning_rate": 9.99717787871887e-06,
66
+ "loss": 0.1874,
67
+ "step": 70
68
+ },
69
+ {
70
+ "epoch": 0.38415366146458585,
71
+ "grad_norm": 0.6510412334843603,
72
+ "learning_rate": 9.979943117513265e-06,
73
+ "loss": 0.1732,
74
+ "step": 80
75
+ },
76
+ {
77
+ "epoch": 0.43217286914765907,
78
+ "grad_norm": 0.6418219788086171,
79
+ "learning_rate": 9.947095408534483e-06,
80
+ "loss": 0.1798,
81
+ "step": 90
82
+ },
83
+ {
84
+ "epoch": 0.4801920768307323,
85
+ "grad_norm": 0.5909891764283862,
86
+ "learning_rate": 9.898737734799134e-06,
87
+ "loss": 0.1671,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.4801920768307323,
92
+ "eval_loss": 0.17217175662517548,
93
+ "eval_runtime": 8.7329,
94
+ "eval_samples_per_second": 15.459,
95
+ "eval_steps_per_second": 1.947,
96
+ "step": 100
97
+ },
98
+ {
99
+ "epoch": 0.5282112845138055,
100
+ "grad_norm": 0.5490095561563818,
101
+ "learning_rate": 9.835021705636201e-06,
102
+ "loss": 0.1672,
103
+ "step": 110
104
+ },
105
+ {
106
+ "epoch": 0.5762304921968787,
107
+ "grad_norm": 0.5919523660281423,
108
+ "learning_rate": 9.756147081366673e-06,
109
+ "loss": 0.1661,
110
+ "step": 120
111
+ },
112
+ {
113
+ "epoch": 0.6242496998799519,
114
+ "grad_norm": 0.5553308704075627,
115
+ "learning_rate": 9.66236114702178e-06,
116
+ "loss": 0.1678,
117
+ "step": 130
118
+ },
119
+ {
120
+ "epoch": 0.6722689075630253,
121
+ "grad_norm": 0.5921710123949474,
122
+ "learning_rate": 9.55395793706341e-06,
123
+ "loss": 0.1604,
124
+ "step": 140
125
+ },
126
+ {
127
+ "epoch": 0.7202881152460985,
128
+ "grad_norm": 0.6297463589804964,
129
+ "learning_rate": 9.43127731353729e-06,
130
+ "loss": 0.1581,
131
+ "step": 150
132
+ },
133
+ {
134
+ "epoch": 0.7202881152460985,
135
+ "eval_loss": 0.15990422666072845,
136
+ "eval_runtime": 8.7401,
137
+ "eval_samples_per_second": 15.446,
138
+ "eval_steps_per_second": 1.945,
139
+ "step": 150
140
+ },
141
+ {
142
+ "epoch": 0.7683073229291717,
143
+ "grad_norm": 0.5203866298970263,
144
+ "learning_rate": 9.294703900549096e-06,
145
+ "loss": 0.1608,
146
+ "step": 160
147
+ },
148
+ {
149
+ "epoch": 0.8163265306122449,
150
+ "grad_norm": 0.5100770394064513,
151
+ "learning_rate": 9.14466587840408e-06,
152
+ "loss": 0.162,
153
+ "step": 170
154
+ },
155
+ {
156
+ "epoch": 0.8643457382953181,
157
+ "grad_norm": 0.5455220998770557,
158
+ "learning_rate": 8.981633641190779e-06,
159
+ "loss": 0.1566,
160
+ "step": 180
161
+ },
162
+ {
163
+ "epoch": 0.9123649459783914,
164
+ "grad_norm": 0.48213355422073995,
165
+ "learning_rate": 8.806118322017525e-06,
166
+ "loss": 0.1486,
167
+ "step": 190
168
+ },
169
+ {
170
+ "epoch": 0.9603841536614646,
171
+ "grad_norm": 0.49326253901345773,
172
+ "learning_rate": 8.61867019052535e-06,
173
+ "loss": 0.1513,
174
+ "step": 200
175
+ },
176
+ {
177
+ "epoch": 0.9603841536614646,
178
+ "eval_loss": 0.1510133445262909,
179
+ "eval_runtime": 8.7214,
180
+ "eval_samples_per_second": 15.479,
181
+ "eval_steps_per_second": 1.949,
182
+ "step": 200
183
+ },
184
+ {
185
+ "epoch": 1.0048019207683074,
186
+ "grad_norm": 1.2575485399779591,
187
+ "learning_rate": 8.41987692770139e-06,
188
+ "loss": 0.1452,
189
+ "step": 210
190
+ },
191
+ {
192
+ "epoch": 1.0528211284513807,
193
+ "grad_norm": 0.49974946529817293,
194
+ "learning_rate": 8.210361783401491e-06,
195
+ "loss": 0.1086,
196
+ "step": 220
197
+ },
198
+ {
199
+ "epoch": 1.1008403361344539,
200
+ "grad_norm": 0.5909720459940095,
201
+ "learning_rate": 7.990781622358535e-06,
202
+ "loss": 0.1098,
203
+ "step": 230
204
+ },
205
+ {
206
+ "epoch": 1.148859543817527,
207
+ "grad_norm": 0.5289140362803703,
208
+ "learning_rate": 7.76182486480253e-06,
209
+ "loss": 0.1042,
210
+ "step": 240
211
+ },
212
+ {
213
+ "epoch": 1.1968787515006003,
214
+ "grad_norm": 0.5570818538274178,
215
+ "learning_rate": 7.524209328148995e-06,
216
+ "loss": 0.1104,
217
+ "step": 250
218
+ },
219
+ {
220
+ "epoch": 1.1968787515006003,
221
+ "eval_loss": 0.15508781373500824,
222
+ "eval_runtime": 8.7041,
223
+ "eval_samples_per_second": 15.51,
224
+ "eval_steps_per_second": 1.953,
225
+ "step": 250
226
+ },
227
+ {
228
+ "epoch": 1.2448979591836735,
229
+ "grad_norm": 0.5285122850283916,
230
+ "learning_rate": 7.278679976522279e-06,
231
+ "loss": 0.1024,
232
+ "step": 260
233
+ },
234
+ {
235
+ "epoch": 1.2929171668667467,
236
+ "grad_norm": 0.5198193011546469,
237
+ "learning_rate": 7.026006585169467e-06,
238
+ "loss": 0.1067,
239
+ "step": 270
240
+ },
241
+ {
242
+ "epoch": 1.34093637454982,
243
+ "grad_norm": 0.5433402449341223,
244
+ "learning_rate": 6.766981327087271e-06,
245
+ "loss": 0.1106,
246
+ "step": 280
247
+ },
248
+ {
249
+ "epoch": 1.3889555822328932,
250
+ "grad_norm": 0.591675736467511,
251
+ "learning_rate": 6.502416289428282e-06,
252
+ "loss": 0.1027,
253
+ "step": 290
254
+ },
255
+ {
256
+ "epoch": 1.4369747899159664,
257
+ "grad_norm": 0.49602130865146205,
258
+ "learning_rate": 6.233140927473033e-06,
259
+ "loss": 0.1068,
260
+ "step": 300
261
+ },
262
+ {
263
+ "epoch": 1.4369747899159664,
264
+ "eval_loss": 0.14931099116802216,
265
+ "eval_runtime": 8.7136,
266
+ "eval_samples_per_second": 15.493,
267
+ "eval_steps_per_second": 1.951,
268
+ "step": 300
269
+ },
270
+ {
271
+ "epoch": 1.4849939975990396,
272
+ "grad_norm": 0.5205229035157025,
273
+ "learning_rate": 5.959999464150101e-06,
274
+ "loss": 0.1043,
275
+ "step": 310
276
+ },
277
+ {
278
+ "epoch": 1.5330132052821128,
279
+ "grad_norm": 0.5213512259437505,
280
+ "learning_rate": 5.683848243257181e-06,
281
+ "loss": 0.1058,
282
+ "step": 320
283
+ },
284
+ {
285
+ "epoch": 1.581032412965186,
286
+ "grad_norm": 0.5252416105681743,
287
+ "learning_rate": 5.40555304468122e-06,
288
+ "loss": 0.1035,
289
+ "step": 330
290
+ },
291
+ {
292
+ "epoch": 1.6290516206482593,
293
+ "grad_norm": 0.479035517749776,
294
+ "learning_rate": 5.125986370034862e-06,
295
+ "loss": 0.1032,
296
+ "step": 340
297
+ },
298
+ {
299
+ "epoch": 1.6770708283313325,
300
+ "grad_norm": 0.47617986914412186,
301
+ "learning_rate": 4.846024707219149e-06,
302
+ "loss": 0.1006,
303
+ "step": 350
304
+ },
305
+ {
306
+ "epoch": 1.6770708283313325,
307
+ "eval_loss": 0.14417614042758942,
308
+ "eval_runtime": 8.7246,
309
+ "eval_samples_per_second": 15.473,
310
+ "eval_steps_per_second": 1.949,
311
+ "step": 350
312
+ },
313
+ {
314
+ "epoch": 1.725090036014406,
315
+ "grad_norm": 0.5665748104077248,
316
+ "learning_rate": 4.566545782488554e-06,
317
+ "loss": 0.1019,
318
+ "step": 360
319
+ },
320
+ {
321
+ "epoch": 1.773109243697479,
322
+ "grad_norm": 0.5167955766147265,
323
+ "learning_rate": 4.2884258086335755e-06,
324
+ "loss": 0.0976,
325
+ "step": 370
326
+ },
327
+ {
328
+ "epoch": 1.8211284513805523,
329
+ "grad_norm": 0.5042866007823615,
330
+ "learning_rate": 4.012536737908288e-06,
331
+ "loss": 0.1003,
332
+ "step": 380
333
+ },
334
+ {
335
+ "epoch": 1.8691476590636253,
336
+ "grad_norm": 0.5495519515030309,
337
+ "learning_rate": 3.7397435283153795e-06,
338
+ "loss": 0.0991,
339
+ "step": 390
340
+ },
341
+ {
342
+ "epoch": 1.9171668667466988,
343
+ "grad_norm": 0.4815628493479367,
344
+ "learning_rate": 3.4709014318193298e-06,
345
+ "loss": 0.1029,
346
+ "step": 400
347
+ },
348
+ {
349
+ "epoch": 1.9171668667466988,
350
+ "eval_loss": 0.14110355079174042,
351
+ "eval_runtime": 8.7254,
352
+ "eval_samples_per_second": 15.472,
353
+ "eval_steps_per_second": 1.948,
354
+ "step": 400
355
+ },
356
+ {
357
+ "epoch": 1.9651860744297718,
358
+ "grad_norm": 0.5378667536535642,
359
+ "learning_rate": 3.2068533129896273e-06,
360
+ "loss": 0.1035,
361
+ "step": 410
362
+ },
363
+ {
364
+ "epoch": 2.009603841536615,
365
+ "grad_norm": 0.4775497724990149,
366
+ "learning_rate": 2.948427006480528e-06,
367
+ "loss": 0.0912,
368
+ "step": 420
369
+ },
370
+ {
371
+ "epoch": 2.057623049219688,
372
+ "grad_norm": 0.6087554919185391,
373
+ "learning_rate": 2.696432721632082e-06,
374
+ "loss": 0.059,
375
+ "step": 430
376
+ },
377
+ {
378
+ "epoch": 2.1056422569027613,
379
+ "grad_norm": 0.466286415886031,
380
+ "learning_rate": 2.4516605023294626e-06,
381
+ "loss": 0.0567,
382
+ "step": 440
383
+ },
384
+ {
385
+ "epoch": 2.1536614645858343,
386
+ "grad_norm": 0.5402767853471913,
387
+ "learning_rate": 2.2148777500843125e-06,
388
+ "loss": 0.0617,
389
+ "step": 450
390
+ },
391
+ {
392
+ "epoch": 2.1536614645858343,
393
+ "eval_loss": 0.1581123322248459,
394
+ "eval_runtime": 8.7373,
395
+ "eval_samples_per_second": 15.451,
396
+ "eval_steps_per_second": 1.946,
397
+ "step": 450
398
+ },
399
+ {
400
+ "epoch": 2.2016806722689077,
401
+ "grad_norm": 0.5420430891075357,
402
+ "learning_rate": 1.9868268181037186e-06,
403
+ "loss": 0.0584,
404
+ "step": 460
405
+ },
406
+ {
407
+ "epoch": 2.2496998799519807,
408
+ "grad_norm": 0.4884067898678699,
409
+ "learning_rate": 1.768222683889757e-06,
410
+ "loss": 0.058,
411
+ "step": 470
412
+ },
413
+ {
414
+ "epoch": 2.297719087635054,
415
+ "grad_norm": 0.49574699167442754,
416
+ "learning_rate": 1.5597507076664187e-06,
417
+ "loss": 0.0588,
418
+ "step": 480
419
+ },
420
+ {
421
+ "epoch": 2.345738295318127,
422
+ "grad_norm": 0.484031285869081,
423
+ "learning_rate": 1.362064483661617e-06,
424
+ "loss": 0.0555,
425
+ "step": 490
426
+ },
427
+ {
428
+ "epoch": 2.3937575030012006,
429
+ "grad_norm": 0.49868843579846434,
430
+ "learning_rate": 1.1757837909808628e-06,
431
+ "loss": 0.0584,
432
+ "step": 500
433
+ },
434
+ {
435
+ "epoch": 2.3937575030012006,
436
+ "eval_loss": 0.1588136851787567,
437
+ "eval_runtime": 8.722,
438
+ "eval_samples_per_second": 15.478,
439
+ "eval_steps_per_second": 1.949,
440
+ "step": 500
441
+ },
442
+ {
443
+ "epoch": 2.4417767106842736,
444
+ "grad_norm": 0.505195377790213,
445
+ "learning_rate": 1.0014926504969535e-06,
446
+ "loss": 0.0568,
447
+ "step": 510
448
+ },
449
+ {
450
+ "epoch": 2.489795918367347,
451
+ "grad_norm": 0.48055085037067635,
452
+ "learning_rate": 8.397374938476594e-07,
453
+ "loss": 0.057,
454
+ "step": 520
455
+ },
456
+ {
457
+ "epoch": 2.53781512605042,
458
+ "grad_norm": 0.48774374180423785,
459
+ "learning_rate": 6.910254502818914e-07,
460
+ "loss": 0.0562,
461
+ "step": 530
462
+ },
463
+ {
464
+ "epoch": 2.5858343337334935,
465
+ "grad_norm": 0.506896175562434,
466
+ "learning_rate": 5.558227567253832e-07,
467
+ "loss": 0.0571,
468
+ "step": 540
469
+ },
470
+ {
471
+ "epoch": 2.6338535414165665,
472
+ "grad_norm": 0.4794928604324473,
473
+ "learning_rate": 4.3455329605058436e-07,
474
+ "loss": 0.0585,
475
+ "step": 550
476
+ },
477
+ {
478
+ "epoch": 2.6338535414165665,
479
+ "eval_loss": 0.1571720838546753,
480
+ "eval_runtime": 8.7061,
481
+ "eval_samples_per_second": 15.506,
482
+ "eval_steps_per_second": 1.953,
483
+ "step": 550
484
+ },
485
+ {
486
+ "epoch": 2.68187274909964,
487
+ "grad_norm": 0.4447078585594017,
488
+ "learning_rate": 3.275972681335421e-07,
489
+ "loss": 0.0557,
490
+ "step": 560
491
+ },
492
+ {
493
+ "epoch": 2.729891956782713,
494
+ "grad_norm": 0.5074668455489234,
495
+ "learning_rate": 2.3528999786421758e-07,
496
+ "loss": 0.0551,
497
+ "step": 570
498
+ },
499
+ {
500
+ "epoch": 2.7779111644657863,
501
+ "grad_norm": 0.4931479047936826,
502
+ "learning_rate": 1.5792088384733174e-07,
503
+ "loss": 0.0578,
504
+ "step": 580
505
+ },
506
+ {
507
+ "epoch": 2.82593037214886,
508
+ "grad_norm": 0.4945483234224741,
509
+ "learning_rate": 9.573249108973281e-08,
510
+ "loss": 0.0571,
511
+ "step": 590
512
+ },
513
+ {
514
+ "epoch": 2.8739495798319328,
515
+ "grad_norm": 0.49071499326291995,
516
+ "learning_rate": 4.891979051886153e-08,
517
+ "loss": 0.0552,
518
+ "step": 600
519
+ },
520
+ {
521
+ "epoch": 2.8739495798319328,
522
+ "eval_loss": 0.15641489624977112,
523
+ "eval_runtime": 8.7188,
524
+ "eval_samples_per_second": 15.484,
525
+ "eval_steps_per_second": 1.95,
526
+ "step": 600
527
+ },
528
+ {
529
+ "epoch": 2.9219687875150058,
530
+ "grad_norm": 0.4898132970657211,
531
+ "learning_rate": 1.762954771655001e-08,
532
+ "loss": 0.058,
533
+ "step": 610
534
+ },
535
+ {
536
+ "epoch": 2.969987995198079,
537
+ "grad_norm": 0.44485985772895487,
538
+ "learning_rate": 1.959862784577937e-09,
539
+ "loss": 0.0548,
540
+ "step": 620
541
+ },
542
+ {
543
+ "epoch": 2.9891956782713085,
544
+ "step": 624,
545
+ "total_flos": 170865984536576.0,
546
+ "train_loss": 0.1184023514103431,
547
+ "train_runtime": 9824.5273,
548
+ "train_samples_per_second": 4.068,
549
+ "train_steps_per_second": 0.064
550
+ }
551
+ ],
552
+ "logging_steps": 10,
553
+ "max_steps": 624,
554
+ "num_input_tokens_seen": 0,
555
+ "num_train_epochs": 3,
556
+ "save_steps": 200,
557
+ "stateful_callbacks": {
558
+ "TrainerControl": {
559
+ "args": {
560
+ "should_epoch_stop": false,
561
+ "should_evaluate": false,
562
+ "should_log": false,
563
+ "should_save": true,
564
+ "should_training_stop": true
565
+ },
566
+ "attributes": {}
567
+ }
568
+ },
569
+ "total_flos": 170865984536576.0,
570
+ "train_batch_size": 2,
571
+ "trial_name": null,
572
+ "trial_params": null
573
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50a815ff58781d61b959ea17727b10ac4468438720e9aa6ed2816dcedcc5f19f
3
+ size 7672
vocab.json ADDED
The diff for this file is too large to render. See raw diff