What is a correct/short way to reshape a factor column in data:
login has_profile_in
1234 Facebook
1234 LinkedIn
1235 VK
into a matrix like this:
login Facebook LinkedIn VK
1234 1 1 0
1235 0 0 1
using tidyr pipeline?
Edit: I know some regular ways of doing this, ie with reshape2
dcast(login~has_profile_in)
and that there are other ways as well. My question is how to do it in a tidyr way, including the operation in a general pipeline-based framework
You can use aggregate
aggregate(has_profile_in ~ login, df, table)
# login has_profile_in.Facebook has_profile_in.LinkedIn has_profile_in.VK
#1 1234 1 1 0
#2 1235 0 0 1
You can rename the columns using setNames
and make it more readable
setNames(aggregate(has_profile_in ~ login, df, table), c("Login", ""))
# Login .Facebook .LinkedIn .VK
#1 1234 1 1 0
#2 1235 0 0 1
As the OP requested tidyr
method
library(dplyr)
library(tidyr)
df1 %>%
mutate(Count = 1) %>%
spread(has_profile_in, Count, fill = 0)
# login Facebook LinkedIn VK
#1 1234 1 1 0
#2 1235 0 0 1
The shortest option would be table
as.data.frame.matrix(+(table(df1)!=0))
# Facebook LinkedIn VK
#1234 1 1 0
#1235 0 0 1
Or using data.table
library(data.table)
dcast(setDT(df1), login~has_profile_in, function(x) +(length(x)!=0))
# login Facebook LinkedIn VK
#1: 1234 1 1 0
#2: 1235 0 0 1
NOTE: dcast
would be the fastest but biased voting is still going on or sockpuppet activity.
您可以使用
model.matrix(~yourFactor+0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.