给定一个词，我们可以使用 Spacy 获得所有可能的引理吗？

Question

输入词是独立的，不是句子的一部分，但我想得到它所有可能的词条，就好像输入词在不同的句子中一样，带有所有可能的 POS 标签。 我还想获得单词引理的查找版本。

我为什么要这样做？

我从所有文档中提取了引理，并且还计算了引理之间的依赖关系链接的数量。 我使用en_core_web_sm完成了这两项工作。 现在，给定一个输入词，我想返回链接最频繁的词条到输入词的所有可能词条。

所以简而言之，我想用所有可能的 POS 标签复制token._lemma的行为，以保持与我计算过的引理链接的一致性。

Answer 1

我发现很难直接从 spaCy 中得到引理和变形，而不首先构造一个例句来给它上下文。 这并不理想，所以我进一步观察，发现LemmaInflect做得很好。

> from lemminflect import getInflection, getAllInflections, getAllInflectionsOOV

> getAllLemmas('watches')
{'NOUN': ('watch',), 'VERB': ('watch',)}

> getAllInflections('watch')
{'NN': ('watch',), 'NNS': ('watches', 'watch'), 'VB': ('watch',), 'VBD': ('watched',), 'VBG': ('watching',), 'VBZ': ('watches',),  'VBP': ('watch',)}

Answer 2

spaCy 并不是为此而设计的——它是为分析文本而不是生成文本而设计的。

链接库看起来不错，但如果您想坚持使用 spaCy 或需要英语以外的语言，您可以查看spacy-lookups-data ，这是用于引理的原始数据。 通常，每个词性都会有一本字典，可以让您查找单词的引理。

Answer 3

为了获得替代引理，我正在尝试结合使用 Spacy rule_lemmatize和 Spacy 查找数据。 rule_lemmatize可能会产生不止一个有效的引理，而查找数据只会为给定的单词提供一个引理（在我检查过的文件中）。 然而，在某些情况下，查找数据会产生引理，而rule_lemmatize不会。

我的例子是西班牙语：

import spacy
import spacy_lookups_data

import json
import pathlib

# text = "fui"
text = "seguid"
# text = "contenta"
print("Input text: \t\t" + text)

# Find lemmas using rules:
nlp = spacy.load("es_core_news_sm")
lemmatizer = nlp.get_pipe("lemmatizer")
doc = nlp(text)
rule_lemmas = lemmatizer.rule_lemmatize(doc[0])
print("Lemmas using rules: " + ", ".join(rule_lemmas))

# Find lemma using lookup:
lookups_path = str(pathlib.Path(spacy_lookups_data.__file__).parent.resolve()) + "/data/es_lemma_lookup.json"
fileObject = open(lookups_path, "r")
lookup_json = fileObject.read()
lookup = json.loads(lookup_json)
print("Lemma from lookup: \t" + lookup[text])

Output：

Input text:         fui        # I went; I was (two verbs with same form in this tense)
Lemmas using rules: ir, ser    # to go, to be (both possible lemmas returned)
Lemma from lookup:  ser        # to be

Input text:         seguid     # Follow! (imperative)
Lemmas using rules: seguid     # Follow! (lemma not returned) 
Lemma from lookup:  seguir     # to follow

Input text:         contenta   # (it) satisfies (verb); contented (adjective) 
Lemmas using rules: contentar  # to satisfy (verb but not adjective lemma returned)
Lemma from lookup:  contento   # contented (adjective, lemma form)

给定一个词，我们可以使用 Spacy 获得所有可能的引理吗？

问题描述

3 个解决方案

解决方案1
4 已采纳 2021-06-01 22:11:49

解决方案2
1 2021-07-20 07:27:42

解决方案3
0 2021-08-01 17:19:36

给定一个词，我们可以使用 Spacy 获得所有可能的引理吗？

问题描述

3 个解决方案

解决方案1 4 已采纳 2021-06-01 22:11:49

解决方案2 1 2021-07-20 07:27:42

解决方案3 0 2021-08-01 17:19:36

解决方案1
4 已采纳 2021-06-01 22:11:49

解决方案2
1 2021-07-20 07:27:42

解决方案3
0 2021-08-01 17:19:36