简体   繁体   中英

How can I make large additions to textstem's lexicon in R?

I have a large body of free-text survey comments that I'm attempting to analyze. I used the textstem package to perform lemmatization, but after looking at the unique tokens it identified I'd like to make further adjustments. For example, it identified "abuses", "abused", and "abusing" as the lemma "abuse" but it left "abusive" untouched...I'd like to change that to "abuse" as well.

I found this post which described how to add to the lexicon on a piecemeal basis such as

lemmas <- lexicon::hash_lemmas[token=="abusive",lemma:="abuse"]
lemmatize_strings(words, dictionary = lemmas)

but in my case I'll have a dataframe with several hundred token/lemma pairs. How can I quickly add them all to lexicon::hash_lemmas?

duh...

new_lemmas <- read_csv("newLemmas.csv")
big_lemmas <- rbind(lexicon::hash_lemmas, new_lemmas)
big_lemmas <- big_lemmas[!duplicated(big_lemmas$token)]

then use big_lemmas as the dictionary

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM