简体   繁体   English

自动主题标签评估指标

[英]Automatic Topic Labeling Evaluation metric

I am trying to do a topic labeling problem on a large dataset of research papers.我正在尝试对大型研究论文数据集进行主题标记问题。 The idea is that I can give each paper a few relevant labels.这个想法是我可以给每篇论文几个相关的标签。

I have 2 questions.我有2个问题。

I know you can do topic modeling in a variety of ways like using LDA and NMF, but what can you do to later extract possible labels from those topics?我知道您可以通过各种方式进行主题建模,例如使用 LDA 和 NMF,但是您可以做些什么来稍后从这些主题中提取可能的标签?

Also, assuming I have extracted a bunch of labels, how can I mathematically estimate their accuracy?另外,假设我提取了一堆标签,我如何从数学上估计它们的准确性? Is there some kind of metric available that can determine say, the variance of the information explained by a label in a document, or something along those lines?是否有某种可用的度量标准可以确定文档中标签解释的信息的方差,或者类似的东西? How would I evaluate my labels without a large group of humans doing qualitative analysis?如果没有一大群人进行定性分析,我将如何评估我的标签?

The most simple way is to use the top k words as the labels.最简单的方法是使用前k个单词作为标签。 More complicated methods include candidate label generation and candidate label ranking.更复杂的方法包括候选标签生成和候选标签排序。 Many related papers talking about this topic:许多相关论文都在谈论这个话题:

  1. Aletras, Nikolaos, and Mark Stevenson.阿莱特拉斯、尼古拉斯和马克·史蒂文森。 "Labelling topics using unsupervised graph-based methods." “使用基于无监督图的方法标记主题。” ACL.访问控制列表。 2014 2014年
  2. Bhatia, Shraey, Jey Han Lau, and Timothy Baldwin.巴蒂亚、Shraey、Jey Han Lau 和蒂莫西·鲍德温。 "Automatic labelling of topics with neural embeddings." “使用神经嵌入自动标记主题。” COLING (2016).冷却 (2016)。
  3. Hingmire, Swapnil, et al. Hingmire、Swapnil 等。 "Document classification by topic labeling." “按主题标签进行文档分类。” SIGIR. SIGIR。 2013 2013年

All the above papers have sections discussing how to evaluate the labels.以上所有论文都有部分讨论如何评估标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM