I am currently working on a Multistate Analysis dataset in "long" form (one row for each individual's observation; each individual is repeatedly measured up to 5 times).
The idea is that each individual can recurrently transition across the levels of the time-varying state variable s = 1, 2, 3, 4
. All the other variables that I have (here cohort
) are fixed within any given id
.
After some analyses, I need to reshape the dataset in "wide" form, according to the specific sequence of visited states. Here is an example of the initial long data:
dat <- read.table(text = "
id cohort s
1 1 2
1 1 2
1 1 1
1 1 4
2 3 1
2 3 1
2 3 3
3 2 1
3 2 2
3 2 3
3 2 3
3 2 4",
header=TRUE)
The final "wide" dataset should take into account the specific individual sequence of visited states, recorded into the newly created variables s1
, s2
, s3
, s4
, s5
, where s1
is the first state visited by the individual and so on.
According to the above example, the wide dataset looks like:
id cohort s1 s2 s3 s4 s5
1 1 2 2 1 4 0
2 3 1 1 3 0 0
3 2 1 2 3 3 4
I tried to use reshape()
, and also to focus on transposing s
, but without the intended result. Actually, my knowledge of the R functions is quite limited.. Can you give any suggestion? Thanks.
Thank you all for your help, I have a related question if I can. Especially when each individual is observed for a long time and there are few transitions across states, it is very useful to reshape the initial sample dat
in this alternative way:
id cohort s1 s2 s3 s4 s5 dur1 dur2 dur3 dur4 dur5
1 1 2 1 4 0 0 2 1 1 0 0
2 3 1 3 0 0 0 2 1 0 0 0
3 2 1 2 3 4 0 1 1 2 1 0
In practice now s1
- s5
are the distinct visited states, and dur1
- dur5
the time spent in each respective distinct visited state.
Can you please give a hand for reaching this data structure? I believe it is necessary to create all the dur
- and s
- variables in an intermediate sample before using reshape()
. Otherwise maybe it is possible to directly adopt -reshape2-
?
dat <- read.table(text = "
id cohort s
1 1 2
1 1 2
1 1 1
1 1 4
2 3 1
2 3 1
2 3 3
3 2 1
3 2 2
3 2 3
3 2 3
3 2 4",
header=TRUE)
df <- data.frame(
dat,
period = sequence(rle(dat$id)$lengths)
)
wide <- reshape(df, v.names = "s", idvar = c("id", "cohort"),
timevar = "period", direction = "wide")
wide[is.na(wide)] = 0
wide
Gives:
id cohort s.1 s.2 s.3 s.4 s.5
1 1 1 2 2 1 4 0
5 2 3 1 1 3 0 0
8 3 2 1 2 3 3 4
then using the following line gives your names:
names(wide) <- c('id','cohort', paste('s', seq_along(1:5), sep=''))
# id cohort s1 s2 s3 s4 s5
# 1 1 1 2 2 1 4 0
# 5 2 3 1 1 3 0 0
# 8 3 2 1 2 3 3 4
If you use sep=''
in the wide
statement you do not have to rename the variables:
wide <- reshape(df, v.names = "s", idvar = c("id", "cohort"),
timevar = "period", direction = "wide", sep='')
I suspect there are ways to avoid creating the period
variable and avoid replacing NA
directly in the wide
statement, but I have not figured those out yet.
ok...
library(plyr)
library(reshape2)
dat2 <- ddply(dat,.(id,cohort), function(x)
data.frame(s=x$s,name=paste0("s",seq_along(x$s))))
dat2 <- ddply(dat2,.(id,cohort), function(x)
dcast(x, id + cohort ~ name, value.var= "s" ,fill= 0)
)
dat2[is.na(dat2)] <- 0
dat2
# id cohort s1 s2 s3 s4 s5
# 1 1 1 2 2 1 4 0
# 2 2 3 1 1 3 0 0
# 3 3 2 1 2 3 3 4
This seem right? I admit the first ddply
is hardly elegant.
Try this:
library(reshape2)
dat$seq <- ave(dat$id, dat$id, FUN = function(x) paste0("s", seq_along(x)))
dat.s <- dcast(dat, id + cohort ~ seq, value.var = "s", fill = 0)
which gives this:
> dat.s
id cohort s1 s2 s3 s4 s5
1 1 1 2 2 1 4 0
2 2 3 1 1 3 0 0
3 3 2 1 2 3 3 4
If you did not mind using just 1, 2, ..., 5 as column names then you could shorten the ave
line to just:
dat$seq <- ave(dat$id, dat$id, FUN = seq_along)
Regarding the second question that was added later try this:
library(plyr)
dur.fn <- function(x) {
r <- rle(x$s)$length
data.frame(id = x$id[1], dur.value = r, dur.seq = paste0("dur", seq_along(r)))
}
dat.dur.long <- ddply(dat, .(id), dur.fn)
dat.dur <- dcast(dat.dur.long, id ~ dur.seq, c, value.var = "dur.value", fill = 0)
cbind(dat.s, dat.dur[-1])
which gives:
id cohort s1 s2 s3 s4 s5 dur1 dur2 dur3 dur4
1 1 1 2 2 1 4 0 2 1 1 0
2 2 3 1 1 3 0 0 2 1 0 0
3 3 2 1 2 3 3 4 1 1 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.