简体   繁体   中英

How to extract multi-word units from a column using R?

my data looks like this:

Topic Measure
climate change reduce emissions
pandemic vaccination
charity call for donations

Now I would like to extract all multi-word units (MWU) within one column, ie:

topic_mwu<-c("climate change")

measure_mwu<-c("reduce emission","call for donations")

Is there a function in R to extract these MWU automatically? Basically I only have to identify those entries including at least one whitespace, so I am thinking of an RegEx - hack..

I would very much appreciate your help!

The code below should work:

#your dataframe
dt <- matrix(c("reduce emission", "call for donations", "pandemic", "climate change", "donations", "charity"), ncol =2)

#make it a vector
dt <- as.vector(dt)

#if the table is very big, you can do unique() to remove duplicates
dt <- unique(dt)

#get the MWU
dt[unlist(lapply(strsplit(dt,split = " "), length)) > 1]

Is this what you were looking for?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM