Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: DmxPerplexity
emoji: 🌖
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.41.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- evaluate
- metric
description: >-
Perplexity metric implemented by d-Matrix. Perplexity (PPL) is one of the most
common metrics for evaluating language models. It is defined as the
exponentiated average negative log-likelihood of a sequence, calculated with
exponent base `e`. Note that this metric is intended for Causual Language
Models, the perplexity calculation is only correct if model uses Cross Entropy
Loss. For more information, see
https://huggingface.co/docs/transformers/perplexity
Metric Card for Perplexity
Metric Description
Perplexity metric implemented by d-Matrix.
Perplexity (PPL) is one of the most common metrics for evaluating language models.
It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base e.
Note that this metric is intended for Causual Language Models, the perplexity calculation is only correct if model uses Cross Entropy Loss.
For more information, see https://huggingface.co/docs/transformers/perplexity
How to Use
At minimum, this metric requires the model and references as inputs.
>>> import evaluate
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = ["lorem ipsum", "Happy Birthday!", "Bienvenue"]
>>> results = perplexity.compute(model='distilgpt2',references=input_texts)
>>> print(results)
{'loss': 4.993086338043213, 'perplexity': 147.390625}
Inputs
- model (
Union[str,AutoModelForCausalLM]): model used for calculating Perplexity - references (
listofstr): input text, each separate text snippet is one list entry. - device (
str): device to run on, defaults to 'cuda' when available. - max_length (
int): maximum sequence length, defaults to 2048.
Output Values
- loss (
float): the loss of the model predictions compared to the reference - perplexity(
float): measures the uncertainty of a model predicting text. Model performance is better when perplexity is lower.
Output Example(s):
{'loss': 4.993086338043213, 'perplexity': 147.390625}
This metric outputs a dictionary, containing the loss and perplexity score.
Examples
>>> import evaluate
>>> from datasets import load_dataset
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> perplexity = evaluate.load("d-matrix/dmx_perplexity", module_type="metric")
>>> input_texts = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")["text"][:10]
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
>>> results = perplexity.compute(model=model,references=input_texts)
>>> print(list(results.keys()))
['loss', 'perplexity']
>>> print(results['loss'])
3.9706921577453613
>>> print(results['perplexity'])
53.021217346191406