Spaces:
Runtime error
Runtime error
| title: DmxPerplexity | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 4.7.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - evaluate | |
| - metric | |
| description: >- | |
| Perplexity metric implemented by d-Matrix. | |
| Perplexity (PPL) is one of the most common metrics for evaluating language models. | |
| It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`. | |
| For more information, see https://huggingface.co/docs/transformers/perplexity | |
| # Metric Card for Perplexity | |
| ## Metric Description | |
| Perplexity metric implemented by d-Matrix. | |
| Perplexity (PPL) is one of the most common metrics for evaluating language models. | |
| It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`. | |
| For more information, see https://huggingface.co/docs/transformers/perplexity | |
| ## How to Use | |
| At minimum, this metric requires the model and references as inputs. | |
| ```python | |
| >>> import evaluate | |
| >>> perplexity = evaluate.load("dmx_perplexity", module_type="metric") | |
| >>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"] | |
| >>> results = perplexity.compute(model='distilgpt2',references=input_texts) | |
| >>> print(results) | |
| {'loss': 4.993086338043213, 'perplexity': 147.390625} | |
| ``` | |
| ### Inputs | |
| - **model** (`Union`[`str`,`AutoModelForCausalLM`]): model used for calculating Perplexity | |
| - **references** (`list` of `str`): input text, each separate text snippet is one list entry. | |
| - **device** (`str`): device to run on, defaults to 'cuda' when available. | |
| - **max_length** (`int`): maximum sequence length, defaults to 2048. | |
| ### Output Values | |
| - **loss** (`float`): the loss of the model predictions compared to the reference | |
| - **perplexity**(`float`): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower. | |
| Output Example(s): | |
| ```python | |
| {'loss': 4.993086338043213, 'perplexity': 147.390625} | |
| ``` | |
| This metric outputs a dictionary, containing the loss and perplexity score. | |
| ### Examples | |
| ```python | |
| >>> import evaluate | |
| >>> from datasets import load_dataset | |
| >>> perplexity = evaluate.load("dmx_perplexity", module_type="metric") | |
| >>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10] | |
| >>> results = perplexity.compute(model='distilgpt2',references=input_texts) | |
| >>> print(list(results.keys())) | |
| ['loss', 'perplexity'] | |
| >>> print(results['loss']) | |
| 3.8299286365509033 | |
| >>> print(results['perplexity']) | |
| 46.05925369262695 | |
| ``` | |
| ## Citation(s) | |
| https://huggingface.co/docs/transformers/perplexity |