I have a large body of free-text survey comments that I'm attempting to analyze. I used the textstem package to perform lemmatization, but after looking at the unique tokens it identified I'd like to make further adjustments. For example, it identified "abuses", "abused", and "abusing" as the lemma "abuse" but it left "abusive" untouched...I'd like to change that to "abuse" as well.
I found this post which described how to add to the lexicon on a piecemeal basis such as
lemmas <- lexicon::hash_lemmas[token=="abusive",lemma:="abuse"]
lemmatize_strings(words, dictionary = lemmas)
but in my case I'll have a dataframe with several hundred token/lemma pairs. How can I quickly add them all to lexicon::hash_lemmas?
duh...
new_lemmas <- read_csv("newLemmas.csv")
big_lemmas <- rbind(lexicon::hash_lemmas, new_lemmas)
big_lemmas <- big_lemmas[!duplicated(big_lemmas$token)]
then use big_lemmas
as the dictionary
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.