Splitting a vector of lists of factors into dataframe with column for each factor level

Question

I have received data like:

tree_uses <- c("Food Fuel Land_benefits Medicines","Food","Food","Food Fuel","Food Fuel","Food")

The factors for each obs split on the white space. I need to convert this into a df with 1 row for each obs and 1 col per "real" factor level.

So for the above data it would look as follow:

ID   Food   Fuel  Land_benefits  Medicines ....
1      1      1        1             1
2      1      0        0             0
3      1      0        0             0
4      1      1        0             0
5      1      1        0             0
6      1      0        0             0
...

Answer 1

Found this works:

split_factor_cols <- function(x) {
    temp1 <- strsplit(as.character(x)," ")
    factor_names <- unique(unlist(temp1))
    zz <- length(factor_names)
    df <- data.frame(matrix(NA,nrow=length(x),ncol=zz))
    names(df) <- factor_names

    for(i in 1:zz) {
        df[,i] <- unlist(lapply(temp1,function(y) sum(charmatch(factor_names[i],x=y),na.rm=T)))
    }
return(df)
}

Perhaps someone knows a convenient function?

Answer 2

Using tm package:

library(tm)

d <- VCorpus(VectorSource(tree_uses))
dtm <- DocumentTermMatrix(d)

# inspect(dtm)

as.matrix(dtm)
#     Terms
# Docs food fuel land_benefits medicines
#    1    1    1             1         1
#    2    1    0             0         0
#    3    1    0             0         0
#    4    1    1             0         0
#    5    1    1             0         0
#    6    1    0             0         0

Splitting a vector of lists of factors into dataframe with column for each factor level

Question

2 answers

solution1
1 2018-08-16 13:09:56

solution2
0 2018-08-17 07:32:58

Splitting a vector of lists of factors into dataframe with column for each factor level

Question

2 answers

solution1 1 2018-08-16 13:09:56

solution2 0 2018-08-17 07:32:58

solution1
1 2018-08-16 13:09:56

solution2
0 2018-08-17 07:32:58