NLTK：語料庫級BLEU與句子級BLEU分數

Question

我已在python中導入nltk以在Ubuntu上計算BLEU分數。 我了解句子級BLEU評分的工作原理，但不了解語料庫級BLEU評分的工作原理。

以下是我的語料庫級BLEU分數代碼：

import nltk

hypothesis = ['This', 'is', 'cat'] 
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)

由於某些原因，上述代碼的bleu分數為0。 我期望語料庫水平的BLEU分數至少為0.5。

這是我的句子級BLEU分數代碼

import nltk

hypothesis = ['This', 'is', 'cat'] 
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)

在這里，考慮到簡潔性和缺失的單詞“ a”，我希望句子的BLEU分數是0.71。 但是，我不了解語料庫級BLEU評分的工作原理。

任何幫助，將不勝感激。

Answer 1

TL; DR ：

>>> import nltk
>>> hypothesis = ['This', 'is', 'cat'] 
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453

（注意：您必須在develop分支上獲取最新版本的NLTK，以獲得穩定版本的BLEU評分實施）

在長：

實際上，如果整個語料庫中只有一個參考和一個假設，則corpus_bleu()和sentence_bleu() corpus_bleu() sentence_bleu()都應返回相同的值，如上例所示。

在代碼中，我們看到sentence_bleu corpus_bleu實際上是corpus_bleu的鴨子類型：

def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                  smoothing_function=None):
    return corpus_bleu([references], [hypothesis], weights, smoothing_function)

如果我們看一下sentence_bleu的參數：

 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                      smoothing_function=None):
    """"
    :param references: reference sentences
    :type references: list(list(str))
    :param hypothesis: a hypothesis sentence
    :type hypothesis: list(str)
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The sentence-level BLEU score.
    :rtype: float
    """

sentence_bleu的引用輸入為list(list(str)) 。

因此，如果您有一個句子字符串，例如"This is a cat" ，則必須對其進行標記化以獲取字符串列表， ["This", "is", "a", "cat"]並且因為它允許多個引用，它必須是字符串列表的列表，例如，如果您有第二個引用“這是貓科動物”，則您對sentence_bleu()輸入將是：

references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)

當涉及到corpus_bleu() list_of_references參數時，它基本上是一個列表，其中sentence_bleu() corpus_bleu() 用作參考：

def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
                smoothing_function=None):
    """
    :param references: a corpus of lists of reference sentences, w.r.t. hypotheses
    :type references: list(list(list(str)))
    :param hypotheses: a list of hypothesis sentences
    :type hypotheses: list(list(str))
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The corpus-level BLEU score.
    :rtype: float
    """

除了查看nltk/translate/bleu_score.py中的doctest nltk/translate/bleu_score.py ，您還可以查看nltk/test/unit/translate/test_bleu_score.py中的unittest，以了解如何使用nltk/test/unit/translate/test_bleu_score.py中的每個組件bleu_score.py 。

順便說一句，由於在（ nltk.translate.__init__.py ]（ https://github.com/nltk/nltk/blob/develop/nltk/translate/ init .py＃L21 ）中將sentence_bleu nltk.translate.__init__.py作為bleu導入，使用

from nltk.translate import bleu

與以下內容相同：

from nltk.translate.bleu_score import sentence_bleu

並在代碼中：

>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False

Answer 2

讓我們來看看：

>>> help(nltk.translate.bleu_score.corpus_bleu)
Help on function corpus_bleu in module nltk.translate.bleu_score:

corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None)
    Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all 
    the hypotheses and their respective references.  

    Instead of averaging the sentence level BLEU scores (i.e. marco-average 
    precision), the original BLEU metric (Papineni et al. 2002) accounts for 
    the micro-average precision (i.e. summing the numerators and denominators
    for each hypothesis-reference(s) pairs before the division).
    ...

您比我更了解該算法的描述，所以我不會嘗試向您“解釋”它。 如果文檔字符串不足以清除所有內容，請查看源代碼本身。 或在本地找到：

>>> nltk.translate.bleu_score.__file__
'.../lib/python3.4/site-packages/nltk/translate/bleu_score.py'

NLTK：語料庫級BLEU與句子級BLEU分數

問題描述

2 個解決方案

解決方案1
19 已采納 2016-11-17 09:03:55

解決方案2
5 2016-11-11 20:54:03

NLTK：語料庫級BLEU與句子級BLEU分數

問題描述

2 個解決方案

解決方案1 19 已采納 2016-11-17 09:03:55

解決方案2 5 2016-11-11 20:54:03

解決方案1
19 已采納 2016-11-17 09:03:55

解決方案2
5 2016-11-11 20:54:03