Reshaping data - is this an operation for tidyr::spread?

Question

I'm trying to reshape a data frame so that each unique value in a column becomes a binary column.

I've been provided data that looks like this:

df <- data.frame(id = c(1,1,2),
                 value = c(200,200,1000),
                 feature = c("A","B","C"))

print(df)

##id,value,feature
##1,200,A
##1,200,B
##2,1000,C

I'm trying to reshape it into this:

##trying to get here
##id,value,A,B,C
##1,200,1,1,0
##2,1000,0,0,1

spread(df,id,feature) fails because ids repeat.

I want to reshape the data to facilitate modeling - I'm trying to predict value from the presence or absence of features.

Answer 1

There is a way to do it with tidyr::spread though, using a transition variable always equal to one.

library(dplyr)
library(tidyr)

mutate(df,v=1) %>%
  spread(feature,v,fill=0)

  id value A B C
1  1   200 1 1 0
2  2  1000 0 0 1

Answer 2

As my previous comment: You have to use dcast of the reshape2 package because spread works well for data that are been processed and/or are consistent with tidy data principles. Your "spreading" is a little bit different (and complicated). Unless of course you use spread combined with other functions.

library(reshape2)
dcast(df, id + value ~ ..., length)
  id value A B C
1  1   200 1 1 0
2  2  1000 0 0 1

Reshaping data - is this an operation for tidyr::spread?

Question

2 answers

solution1
6 2015-08-01 16:36:36

solution2
4 ACCPTED 2015-08-01 16:35:08

Reshaping data - is this an operation for tidyr::spread?

Question

2 answers

solution1 6 2015-08-01 16:36:36

solution2 4 ACCPTED 2015-08-01 16:35:08

solution1
6 2015-08-01 16:36:36

solution2
4 ACCPTED 2015-08-01 16:35:08