Update README.md
Browse files
README.md
CHANGED
|
@@ -2622,7 +2622,7 @@ model-index:
|
|
| 2622 |
## Intended Usage & Model Info
|
| 2623 |
|
| 2624 |
`jina-embedding-b-en-v2` is an English, monolingual **embedding model** supporting **8192 sequence length**.
|
| 2625 |
-
It is based on a Bert architecture (
|
| 2626 |
The backbone `jina-bert-b-en-v2` is pretrained on the C4 dataset.
|
| 2627 |
The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
|
| 2628 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
|
@@ -2631,18 +2631,18 @@ The embedding model was trained using 512 sequence length, but extrapolates to 8
|
|
| 2631 |
This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search,...
|
| 2632 |
|
| 2633 |
With a standard size of 137 million parameters, the model enables fast inference while delivering better performance than our small model. It is recommended to use a single GPU for inference.
|
| 2634 |
-
Additionally, we provide the following embedding models
|
| 2635 |
|
| 2636 |
-
### V1 (Based on T5)
|
| 2637 |
|
| 2638 |
- [`jina-embedding-s-en-v1`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
|
| 2639 |
- [`jina-embedding-b-en-v1`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
|
| 2640 |
- [`jina-embedding-l-en-v1`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
|
| 2641 |
|
| 2642 |
-
### V2 (Based on JinaBert)
|
| 2643 |
|
| 2644 |
-
- [`jina-embedding-s-en-v2`](https://huggingface.co/jinaai/jina-embedding-s-en-v2): 33 million parameters
|
| 2645 |
-
- [`jina-embedding-b-en-v2`](https://huggingface.co/jinaai/jina-embedding-b-en-v2): 137 million parameters
|
| 2646 |
- [`jina-embedding-l-en-v2`](https://huggingface.co/jinaai/jina-embedding-l-en-v2): 435 million parameters.
|
| 2647 |
|
| 2648 |
## Data & Parameters
|
|
|
|
| 2622 |
## Intended Usage & Model Info
|
| 2623 |
|
| 2624 |
`jina-embedding-b-en-v2` is an English, monolingual **embedding model** supporting **8192 sequence length**.
|
| 2625 |
+
It is based on a Bert architecture (JinaBert) that supports the symmetric bidirectional variant of [ALiBi](https://arxiv.org/abs/2108.12409) to allow longer sequence length.
|
| 2626 |
The backbone `jina-bert-b-en-v2` is pretrained on the C4 dataset.
|
| 2627 |
The model is further trained on Jina AI's collection of more than 400 millions of sentence pairs and hard negatives.
|
| 2628 |
These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
|
|
|
|
| 2631 |
This makes our model useful for a range of use cases, especially when processing long documents is needed, including long document retrieval, semantic textual similarity, text reranking, recommendation, RAG and LLM-based generative search,...
|
| 2632 |
|
| 2633 |
With a standard size of 137 million parameters, the model enables fast inference while delivering better performance than our small model. It is recommended to use a single GPU for inference.
|
| 2634 |
+
Additionally, we provide the following embedding models:
|
| 2635 |
|
| 2636 |
+
### V1 (Based on T5, 512 Seq)
|
| 2637 |
|
| 2638 |
- [`jina-embedding-s-en-v1`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
|
| 2639 |
- [`jina-embedding-b-en-v1`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
|
| 2640 |
- [`jina-embedding-l-en-v1`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
|
| 2641 |
|
| 2642 |
+
### V2 (Based on JinaBert, 8k Seq)
|
| 2643 |
|
| 2644 |
+
- [`jina-embedding-s-en-v2`](https://huggingface.co/jinaai/jina-embedding-s-en-v2): 33 million parameters **(you are here)**.
|
| 2645 |
+
- [`jina-embedding-b-en-v2`](https://huggingface.co/jinaai/jina-embedding-b-en-v2): 137 million parameters.
|
| 2646 |
- [`jina-embedding-l-en-v2`](https://huggingface.co/jinaai/jina-embedding-l-en-v2): 435 million parameters.
|
| 2647 |
|
| 2648 |
## Data & Parameters
|