简体   繁体   English

如何在 R 中对 textstem 的词典进行大量添加?

[英]How can I make large additions to textstem's lexicon in R?

I have a large body of free-text survey comments that I'm attempting to analyze.我有大量的自由文本调查评论,我正试图对其进行分析。 I used the textstem package to perform lemmatization, but after looking at the unique tokens it identified I'd like to make further adjustments.我使用 textstem 包来执行词形还原,但在查看了它确定的唯一标记后,我想进行进一步的调整。 For example, it identified "abuses", "abused", and "abusing" as the lemma "abuse" but it left "abusive" untouched...I'd like to change that to "abuse" as well.例如,它将“abuses”、“abused”和“abusing”标识为引理“abuse”,但未触及“abusive”……我也想将其更改为“abuse”。

I found this post which described how to add to the lexicon on a piecemeal basis such as我发现这篇文章描述了如何在零碎的基础上添加到词典中,例如

lemmas <- lexicon::hash_lemmas[token=="abusive",lemma:="abuse"]
lemmatize_strings(words, dictionary = lemmas)

but in my case I'll have a dataframe with several hundred token/lemma pairs.但就我而言,我将有一个包含数百个标记/引理对的数据框。 How can I quickly add them all to lexicon::hash_lemmas?如何快速将它们全部添加到 lexicon::hash_lemmas?

duh...呃……

new_lemmas <- read_csv("newLemmas.csv")
big_lemmas <- rbind(lexicon::hash_lemmas, new_lemmas)
big_lemmas <- big_lemmas[!duplicated(big_lemmas$token)]

then use big_lemmas as the dictionary然后使用big_lemmas作为字典

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在R中使来自大型数据集的条形图更加清晰简洁? - How can I make my barchart from a large data set more clear and concise in R? 如何在 R 中的两列中打印大量 ggplot 图形而不将它们展平? - How can I make a large number of ggplot figures print in two columns in R without flattening them? 如何用R中的NA替换大型矩阵中的特定值 - How can I replace specific values within a large matrix with NA's in R 如何使用大矩阵制作热图? - How can I make a heatmap with a large matrix? 如何为 r 文本分析创建定制的贸易/法律词典 - How to create a customized trade/law lexicon for r text analysis 如何在 Syuzhet 上为 R 使用自定义 NRC 样式的词典? - How to use a custom NRC-style lexicon on Syuzhet for R? 如何在 R 中创建一个可读且可以保存的大缠结图 - How can I create a large tanglegram in R that is readable and can be saved 如何使用我自己的词典词典分析R中的句子? - How to use my own lexicon dictionary to analyse sentences in R? 如何以XML-LMF格式导入词典以在R中进行情感分析 - How to import a lexicon in XML-LMF format for sentiment analysis in R 如何将情绪词典导入R以进行Kickstarter的数据抓取 - How to import emotion lexicon into R for data scraping of Kickstarter
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM