简体   繁体   中英

Stemming a text column in a dataframe with R

I have a dataframe with this structure :

#Load lexicon
Lexicon_DF <- read.csv("LexiconFrancais.csv",header=F, sep=";") 

The structure of the "LexiconFrancais.csv" is like this :

French Translation (Google Translate);Positive;Negative
un dos;0;0
abaque;0;0
abandonner;0;1
abandonné;0;1
abandon;0;1
se calmer;0;0
réduction;0;0
abba;1;0
abbé;0;0
abréger;0;0
abréviation;0;0



> Lexicon_DF
                                         V1       V2       V3
1     French Translation (Google Translate) Positive Negative
2                                    un dos        0        0
3                                    abaque        0        0
4                                abandonner        0        1
5                                 abandonné        0        1
6                                   abandon        0        1
7                                 se calmer        0        0
8                                 réduction        0        0
9                                      abba        1        0
10                                     abbé        0        0
11                                  abréger        0        0
12                              abréviation        0        0

I try to stemm the first column of the dataframe, for this I did :

Lexicon_DF <- SnowballC::wordStem(Lexicon_DF[[1]], language = 'fr')

But after this command I find only the first column in the Lexicon_DF dataframe, the two other column disappear.

> Lexicon_DF <- SnowballC::wordStem(Lexicon_DF[[1]], language = 'fr')
> Lexicon_DF
   [1] "French Translation (Google Translate)" "un dos"                                "abaqu"                                
   [4] "abandon"                               "abandon"                               "abandon"                              
   [7] "se calm"                               "réduct"                                "abba"                                 
  [10] "abbé"                                  "abreg"                                 "abrévi" 

How can I do the stemming wtihout missing the two other columns?

thank you

You are trying to replace the whole content of Lexicon_DF with the o/p of wordStem-

Try this :

Lexicon_DF$V1 <-SnowballC::wordStem(Lexicon_DF[[1]], language = 'fr')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM