I'm trying to reshape a data frame so that each unique value in a column becomes a binary column.
I've been provided data that looks like this:
df <- data.frame(id = c(1,1,2),
value = c(200,200,1000),
feature = c("A","B","C"))
print(df)
##id,value,feature
##1,200,A
##1,200,B
##2,1000,C
I'm trying to reshape it into this:
##trying to get here
##id,value,A,B,C
##1,200,1,1,0
##2,1000,0,0,1
spread(df,id,feature)
fails because ids repeat.
I want to reshape the data to facilitate modeling - I'm trying to predict value from the presence or absence of features.
There is a way to do it with tidyr::spread
though, using a transition variable always equal to one.
library(dplyr)
library(tidyr)
mutate(df,v=1) %>%
spread(feature,v,fill=0)
id value A B C
1 1 200 1 1 0
2 2 1000 0 0 1
As my previous comment: You have to use dcast
of the reshape2
package because spread
works well for data that are been processed and/or are consistent with tidy data principles. Your "spreading" is a little bit different (and complicated). Unless of course you use spread
combined with other functions.
library(reshape2)
dcast(df, id + value ~ ..., length)
id value A B C
1 1 200 1 1 0
2 2 1000 0 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.