[英]NLTK: corpus-level bleu vs sentence-level BLEU score
我已在python中導入nltk以在Ubuntu上計算BLEU分數。 我了解句子級BLEU評分的工作原理,但不了解語料庫級BLEU評分的工作原理。
以下是我的語料庫級BLEU分數代碼:
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)
由於某些原因,上述代碼的bleu分數為0。 我期望語料庫水平的BLEU分數至少為0.5。
這是我的句子級BLEU分數代碼
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)
在這里,考慮到簡潔性和缺失的單詞“ a”,我希望句子的BLEU分數是0.71。 但是,我不了解語料庫級BLEU評分的工作原理。
任何幫助,將不勝感激。
TL; DR :
>>> import nltk
>>> hypothesis = ['This', 'is', 'cat']
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453
(注意:您必須在develop
分支上獲取最新版本的NLTK,以獲得穩定版本的BLEU評分實施)
在長 :
實際上,如果整個語料庫中只有一個參考和一個假設,則corpus_bleu()
和sentence_bleu()
corpus_bleu()
sentence_bleu()
都應返回相同的值,如上例所示。
在代碼中,我們看到sentence_bleu
corpus_bleu
實際上是corpus_bleu
的鴨子類型 :
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
return corpus_bleu([references], [hypothesis], weights, smoothing_function)
如果我們看一下sentence_bleu
的參數:
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
""""
:param references: reference sentences
:type references: list(list(str))
:param hypothesis: a hypothesis sentence
:type hypothesis: list(str)
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)
:return: The sentence-level BLEU score.
:rtype: float
"""
sentence_bleu
的引用輸入為list(list(str))
。
因此,如果您有一個句子字符串,例如"This is a cat"
,則必須對其進行標記化以獲取字符串列表, ["This", "is", "a", "cat"]
並且因為它允許多個引用,它必須是字符串列表的列表,例如,如果您有第二個引用“這是貓科動物”,則您對sentence_bleu()
輸入將是:
references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)
當涉及到corpus_bleu()
list_of_references參數時,它基本上是一個列表,其中sentence_bleu()
corpus_bleu()
用作參考 :
def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
smoothing_function=None):
"""
:param references: a corpus of lists of reference sentences, w.r.t. hypotheses
:type references: list(list(list(str)))
:param hypotheses: a list of hypothesis sentences
:type hypotheses: list(list(str))
:param weights: weights for unigrams, bigrams, trigrams and so on
:type weights: list(float)
:return: The corpus-level BLEU score.
:rtype: float
"""
除了查看nltk/translate/bleu_score.py
中的doctest nltk/translate/bleu_score.py
,您還可以查看nltk/test/unit/translate/test_bleu_score.py
中的unittest,以了解如何使用nltk/test/unit/translate/test_bleu_score.py
中的每個組件bleu_score.py
。
順便說一句,由於在( nltk.translate.__init__.py
]( https://github.com/nltk/nltk/blob/develop/nltk/translate/ init .py#L21 )中將sentence_bleu
nltk.translate.__init__.py
作為bleu
導入,使用
from nltk.translate import bleu
與以下內容相同:
from nltk.translate.bleu_score import sentence_bleu
並在代碼中:
>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False
讓我們來看看:
>>> help(nltk.translate.bleu_score.corpus_bleu)
Help on function corpus_bleu in module nltk.translate.bleu_score:
corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None)
Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all
the hypotheses and their respective references.
Instead of averaging the sentence level BLEU scores (i.e. marco-average
precision), the original BLEU metric (Papineni et al. 2002) accounts for
the micro-average precision (i.e. summing the numerators and denominators
for each hypothesis-reference(s) pairs before the division).
...
您比我更了解該算法的描述,所以我不會嘗試向您“解釋”它。 如果文檔字符串不足以清除所有內容,請查看源代碼本身。 或在本地找到:
>>> nltk.translate.bleu_score.__file__
'.../lib/python3.4/site-packages/nltk/translate/bleu_score.py'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.