如何在 R 中对 textstem 的词典进行大量添加？

Question

I have a large body of free-text survey comments that I'm attempting to analyze.我有大量的自由文本调查评论，我正试图对其进行分析。 I used the textstem package to perform lemmatization, but after looking at the unique tokens it identified I'd like to make further adjustments.我使用 textstem 包来执行词形还原，但在查看了它确定的唯一标记后，我想进行进一步的调整。 For example, it identified "abuses", "abused", and "abusing" as the lemma "abuse" but it left "abusive" untouched...I'd like to change that to "abuse" as well.例如，它将“abuses”、“abused”和“abusing”标识为引理“abuse”，但未触及“abusive”……我也想将其更改为“abuse”。

I found this post which described how to add to the lexicon on a piecemeal basis such as我发现这篇文章描述了如何在零碎的基础上添加到词典中，例如

lemmas <- lexicon::hash_lemmas[token=="abusive",lemma:="abuse"]
lemmatize_strings(words, dictionary = lemmas)

but in my case I'll have a dataframe with several hundred token/lemma pairs.但就我而言，我将有一个包含数百个标记/引理对的数据框。 How can I quickly add them all to lexicon::hash_lemmas?如何快速将它们全部添加到 lexicon::hash_lemmas？

Answer 1

duh...呃……

new_lemmas <- read_csv("newLemmas.csv")
big_lemmas <- rbind(lexicon::hash_lemmas, new_lemmas)
big_lemmas <- big_lemmas[!duplicated(big_lemmas$token)]

then use big_lemmas as the dictionary然后使用big_lemmas作为字典

如何在 R 中对 textstem 的词典进行大量添加？

问题描述

1 个解决方案

解决方案1
0 2020-01-07 21:56:04

如何在 R 中对 textstem 的词典进行大量添加？

问题描述

1 个解决方案

解决方案1 0 2020-01-07 21:56:04

解决方案1
0 2020-01-07 21:56:04