With these data
d <- data.frame(time=1:5, side=c("r","r","r","l","l"), val = c(1,2,1,2,1))
d
time side val
1 1 r 1
2 2 r 2
3 3 r 1
4 4 l 2
5 5 l 1
We can spread to a tidy dataframe like this:
library(tidyverse)
d %>% spread(side,val)
Which gives:
time l r
1 1 NA 1
2 2 NA 2
3 3 NA 1
4 4 2 NA
5 5 1 NA
But say we have more than one val
for a given time/side. For example:
d <- data.frame(time=c(1:5,5), side=c("r","r","r","l","l","l"), val = c(1,2,1,2,1,2))
time side val
1 1 r 1
2 2 r 2
3 3 r 1
4 4 l 2
5 5 l 1
6 5 l 2
Now this won't work because of duplicated values:
d %>% spread(side,val)
Error: Duplicate identifiers for rows (5, 6)
Is there an efficient way to force this behavior (or alternative). The output would be eg
time l r
1 1 NA 1
2 2 NA 2
3 3 NA 1
4 4 2 NA
5 5 1, 2 NA
The data.table
/ reshape2
equivalent of tidyr::spread
is dcast
. It has a more complicated syntax than spread
, but it's more flexible. To accomplish your task we can use the below chunk.
We use the formula to 'spread' side
by time
(filling with the values in the val
column), provide the fill
value of NA
, and specify we want to list
elements together when aggregation is needed per value of time
.
library(data.table)
d <- data.table(time=c(1:5,5),
side=c("r","r","r","l","l","l"),
val = c(1,2,1,2,1,2))
data.table::dcast(d, time ~ side,
value.var='val',
fill=NA,
fun.aggregate=list)
#OUTPUT
# time l r
# 1: 1 NA 1
# 2: 2 NA 2
# 3: 3 NA 1
# 4: 4 2 NA
# 5: 5 1,2 NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.