I have received data like:
tree_uses <- c("Food Fuel Land_benefits Medicines","Food","Food","Food Fuel","Food Fuel","Food")
The factors for each obs split on the white space. I need to convert this into a df with 1 row for each obs and 1 col per "real" factor level.
So for the above data it would look as follow:
ID Food Fuel Land_benefits Medicines ....
1 1 1 1 1
2 1 0 0 0
3 1 0 0 0
4 1 1 0 0
5 1 1 0 0
6 1 0 0 0
...
Found this works:
split_factor_cols <- function(x) {
temp1 <- strsplit(as.character(x)," ")
factor_names <- unique(unlist(temp1))
zz <- length(factor_names)
df <- data.frame(matrix(NA,nrow=length(x),ncol=zz))
names(df) <- factor_names
for(i in 1:zz) {
df[,i] <- unlist(lapply(temp1,function(y) sum(charmatch(factor_names[i],x=y),na.rm=T)))
}
return(df)
}
Perhaps someone knows a convenient function?
Using tm package:
library(tm)
d <- VCorpus(VectorSource(tree_uses))
dtm <- DocumentTermMatrix(d)
# inspect(dtm)
as.matrix(dtm)
# Terms
# Docs food fuel land_benefits medicines
# 1 1 1 1 1
# 2 1 0 0 0
# 3 1 0 0 0
# 4 1 1 0 0
# 5 1 1 0 0
# 6 1 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.