I have a dataset (DF) that looks like what I have below:
ID DOB Age Outcome
1 1/01/80 18 1
1 1/01/80 18 0
2 1/02/81 17 1
2 1/02/81 17 0
3 1/03/70 28 1
I want to change my database to wide format, so that I have one row per ID. However, given that DOB and Age are the same for each ID, I want these variables to be a single column in the new database and simply have multiple columns for the Outcome variable, as below:
ID DOB Age Outcome.1 Outcome.2
1 1/01/80 18 1 0
2 1/02/81 17 1 0
3 1/03/70 28 1 NA
I have tried using tidyr and reshape, but I can't seem to get the database into this format. For example when I use the code:
spread(DF, key=ID, value = Outcome)
I get an error that indicates that I have duplicate identifiers for rows. Is there a way to get the database into the format I would like?
Thanks.
One solution could be achieved by following steps using tidyverse
. The idea is to add row number
to a column to provide a unique ID for each row. Afterwards there are different ways to apply spread
.
df <- read.table(text = "ID DOB Age Outcome
1 1/01/80 18 1
1 1/01/80 18 0
2 1/02/81 17 1
2 1/02/81 17 0
3 1/03/70 28 1", header = T, stringsAsFactors = F)
library(tidyverse)
df %>% mutate(rownum = row_number(), Outcome = paste("Outcome",Outcome,sep=".")) %>%
spread(Outcome, rownum) %>%
mutate(Outcome.0 = ifelse(!is.na(Outcome.0),0, NA )) %>%
mutate(Outcome.1 = ifelse(!is.na(Outcome.1),1, NA ))
# Result:
# ID DOB Age Outcome.0 Outcome.1
#1 1 1/01/80 18 0 1
#2 2 1/02/81 17 0 1
#3 3 1/03/70 28 NA 1
dcast函数用于类似这样的事情。
dcast(data, ID + DOB + Age ~ Outcome)
You could use tidyr
and dplyr
:
DF %>%
group_by(ID) %>%
mutate(OutcomeID = paste0('Outcome.', row_number())) %>%
spread(OutcomeID, Outcome)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.