my data looks like this:
Topic | Measure |
---|---|
climate change | reduce emissions |
pandemic | vaccination |
charity | call for donations |
Now I would like to extract all multi-word units (MWU) within one column, ie:
topic_mwu<-c("climate change")
measure_mwu<-c("reduce emission","call for donations")
Is there a function in R to extract these MWU automatically? Basically I only have to identify those entries including at least one whitespace, so I am thinking of an RegEx - hack..
I would very much appreciate your help!
The code below should work:
#your dataframe
dt <- matrix(c("reduce emission", "call for donations", "pandemic", "climate change", "donations", "charity"), ncol =2)
#make it a vector
dt <- as.vector(dt)
#if the table is very big, you can do unique() to remove duplicates
dt <- unique(dt)
#get the MWU
dt[unlist(lapply(strsplit(dt,split = " "), length)) > 1]
Is this what you were looking for?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.