How can I make large additions to textstem's lexicon in R?

Question

I have a large body of free-text survey comments that I'm attempting to analyze. I used the textstem package to perform lemmatization, but after looking at the unique tokens it identified I'd like to make further adjustments. For example, it identified "abuses", "abused", and "abusing" as the lemma "abuse" but it left "abusive" untouched...I'd like to change that to "abuse" as well.

I found this post which described how to add to the lexicon on a piecemeal basis such as

lemmas <- lexicon::hash_lemmas[token=="abusive",lemma:="abuse"]
lemmatize_strings(words, dictionary = lemmas)

but in my case I'll have a dataframe with several hundred token/lemma pairs. How can I quickly add them all to lexicon::hash_lemmas?

Answer 1

duh...

new_lemmas <- read_csv("newLemmas.csv")
big_lemmas <- rbind(lexicon::hash_lemmas, new_lemmas)
big_lemmas <- big_lemmas[!duplicated(big_lemmas$token)]

then use big_lemmas as the dictionary

How can I make large additions to textstem's lexicon in R?

Question

1 answers

solution1
0 2020-01-07 21:56:04

How can I make large additions to textstem's lexicon in R?

Question

1 answers

solution1 0 2020-01-07 21:56:04

solution1
0 2020-01-07 21:56:04