简体繁体 English

摘要-文本排名算法

[英]Summarization-Text rank algorithm

原文 2020-07-04 16:15:14 2 1 python/ machine-learning/ nlp/ bert-language-model/ textrank

What are the advantages of using text rank algorithm for summarization over BERT summarization?使用文本排名算法进行摘要比 BERT 摘要有什么优势？ Even though both can be used as extractive summarization method, is there any particular advantage for text rank?尽管两者都可以用作提取摘要方法，但文本排名有什么特别的优势吗？

1 个解决方案

TextRank implementations tend to be lightweight and can run fast even with limited memory resources, while the transformer models such as BERT tend to be rather large and require lots of memory. TextRank 实现往往是轻量级的，即使在 memory 资源有限的情况下也可以快速运行，而BERT等转换器模型往往相当大，需要大量的 memory。 While the TinyML community has outstanding work on techniques to make DL models run within limited resources, there may be a resource advantage for some use cases.虽然TinyML社区在使 DL 模型在有限资源内运行的技术方面有着出色的工作，但对于某些用例来说可能存在资源优势。

Some of the TextRank implementations can be "directed" by adding semantic relations, which one can consider as a priori structure to enrich the graph used -- or in some cases means of incorporating human-in-the-loop approaches.一些 TextRank 实现可以通过添加语义关系来“指导”，人们可以将其视为一种先验结构，以丰富所使用的图 - 或者在某些情况下是结合人类在环方法的手段。 Those can provide advantages over supervised learning models which have been trained purely on data.与纯粹基于数据训练的监督学习模型相比，这些模型具有优势。 Even so, there are similar efforts for DL in general (eg, variations on the theme of transfer learning ) from which transformers may benefit.即便如此，DL 也有类似的努力（例如，迁移学习主题的变体），变形金刚可能会从中受益。

Another potential benefit is that TextRank approaches tend to be more transparent , while transformer models can be challenging in terms of explainability .另一个潜在的好处是TextRank方法往往更加透明，而转换器模型在可解释性方面可能具有挑战性。 There are tools that help greatly, but this concern becomes important in the context of model bias and fairness , data ethics , regulatory compliance , and so on.有一些工具可以提供很大帮助，但在model 偏见和公平性、数据伦理、法规遵从性等方面，这种担忧变得很重要。

Based on personal experience, while I'm the lead committer for one of the popular TextRank open source implementations , I only use its extractive summarization features for use cases where a "cheap and fast" solution is needed.根据个人经验，虽然我是流行的 TextRank开源实现之一的首席提交者，但我只将其提取摘要功能用于需要“便宜且快速”解决方案的用例。 Otherwise I'd recommend considering more sophisticated approaches to summarization.否则，我建议考虑更复杂的摘要方法。 For example, I recommend keeping watch on the ongoing research by the author of TextRank, Rada Mihalcea , and her graduate students at U Michigan.例如，我建议密切关注 TextRank 的作者Rada Mihalcea和她在密歇根大学的研究生正在进行的研究。

In terms of comparing "Which text summarization methods work better?"在比较“哪种文本摘要方法效果更好？”方面。 I'd point toward work on abstractive summarization , particularly recent work by John Bohannon, et al.我会指出抽象摘要方面的工作，特别是John Bohannon 等人最近的工作。 , at Primer . ，在入门。 For excellent examples, check the "Daily Briefings" of CV19 research which their team generates using natural language understanding, knowledge graph, abstractive summarization, etc. Amy Heineike discusses their approach in "Machines for unlocking the deluge of COVID-19 papers, articles, and conversations" .有关出色的示例，请查看他们的团队使用自然语言理解、知识图、抽象摘要等生成的 CV19 研究的“每日简报” 。Amy Heineike 在“解锁大量 COVID-19 论文、文章的机器，和对话” 。