简体   繁体   中英

Reshaping data - is this an operation for tidyr::spread?

I'm trying to reshape a data frame so that each unique value in a column becomes a binary column.

I've been provided data that looks like this:

df <- data.frame(id = c(1,1,2),
                 value = c(200,200,1000),
                 feature = c("A","B","C"))

print(df)

##id,value,feature
##1,200,A
##1,200,B
##2,1000,C

I'm trying to reshape it into this:

##trying to get here
##id,value,A,B,C
##1,200,1,1,0
##2,1000,0,0,1

spread(df,id,feature) fails because ids repeat.

I want to reshape the data to facilitate modeling - I'm trying to predict value from the presence or absence of features.

There is a way to do it with tidyr::spread though, using a transition variable always equal to one.

library(dplyr)
library(tidyr)

mutate(df,v=1) %>%
  spread(feature,v,fill=0)

  id value A B C
1  1   200 1 1 0
2  2  1000 0 0 1

As my previous comment: You have to use dcast of the reshape2 package because spread works well for data that are been processed and/or are consistent with tidy data principles. Your "spreading" is a little bit different (and complicated). Unless of course you use spread combined with other functions.

library(reshape2)
dcast(df, id + value ~ ..., length)
  id value A B C
1  1   200 1 1 0
2  2  1000 0 0 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM