pyTorch LSTM的准确度得分

Question

I have been running this LSTM tutorial on the wikigold.conll NER data set 我一直在wikigold.conll NER数据集上运行这个LSTM教程

training_data contains a list of tuples of sequences and tags, for example: training_data包含序列和标签元组的列表，例如：

training_data = [
    ("They also have a song called \" wake up \"".split(), ["O", "O", "O", "O", "O", "O", "I-MISC", "I-MISC", "I-MISC", "I-MISC"]),
    ("Major General John C. Scheidt Jr.".split(), ["O", "O", "I-PER", "I-PER", "I-PER"])
]

And I wrote down this function 我写下了这个功能

def predict(indices):
    """Gets a list of indices of training_data, and returns a list of predicted lists of tags"""
    for index in indicies:
        inputs = prepare_sequence(training_data[index][0], word_to_ix)
        tag_scores = model(inputs)
        values, target = torch.max(tag_scores, 1)
        yield target

This way I can get the predicted labels for specific indices in the training data. 通过这种方式，我可以获得训练数据中特定指标的预测标签。

However, how do I evaluate the accuracy score across all training data. 但是，如何评估所有训练数据的准确度分数。

Accuracy being, the amount of words correctly classified across all sentences divided by the word count. 准确性是，所有句子中正确分类的单词数量除以单词计数。

This is what I came up with, which is extremely slow and ugly: 这就是我提出的，这是非常缓慢和丑陋的：

y_pred = list(predict([s for s, t in training_data]))
y_true = [t for s, t in training_data]
c=0
s=0
for i in range(len(training_data)):
    n = len(y_true[i])
    #super ugly and ineffiicient
    s+=(sum(sum(list(y_true[i].view(-1, n) == y_pred[i].view(-1, n).data))))
    c+=n

print ('Training accuracy:{a}'.format(a=float(s)/c))

How can this be done efficiently in pytorch ? 如何在pytorch中有效地完成这项工作？

PS: I've been trying to use sklearn's accuracy_score unsuccessfully PS：我一直试图使用sklearn的accuracy_score失败

Answer 1

I would use numpy in order to not iterate the list in pure python. 我会使用numpy ，以便不在纯python中迭代列表。

The results are the same, but it runs much faster 结果是一样的，但它运行得更快

def accuracy_score(y_true, y_pred):
    y_pred = np.concatenate(tuple(y_pred))
    y_true = np.concatenate(tuple([[t for t in y] for y in y_true])).reshape(y_pred.shape)
    return (y_true == y_pred).sum() / float(len(y_true))

And this is how to use it: 这是如何使用它：

#original code:
y_pred = list(predict([s for s, t in training_data]))
y_true = [t for s, t in training_data]
#numpy accuracy score
print(accuracy_score(y_true, y_pred))

Answer 2

You may use sklearn's accuracy_score like this: 你可以像这样使用sklearn的accuracy_score ：

values, target = torch.max(tag_scores, -1)
accuracy = accuracy_score(train_y, target)
print("\nTraining accuracy is %d%%" % (accuracy*100))

pyTorch LSTM的准确度得分

问题描述

This is what I came up with, which is extremely slow and ugly: 这就是我提出的，这是非常缓慢和丑陋的：

How can this be done efficiently in pytorch ? 如何在pytorch中有效地完成这项工作？

2 个解决方案

解决方案1
4 已采纳 2017-05-23 09:33:20

解决方案2
0 2018-09-06 15:02:39

pyTorch LSTM的准确度得分

问题描述

This is what I came up with, which is extremely slow and ugly: 这就是我提出的，这是非常缓慢和丑陋的：

How can this be done efficiently in pytorch ? 如何在pytorch中有效地完成这项工作？

2 个解决方案

解决方案1 4 已采纳 2017-05-23 09:33:20

解决方案2 0 2018-09-06 15:02:39

解决方案1
4 已采纳 2017-05-23 09:33:20

解决方案2
0 2018-09-06 15:02:39