I have a dataset that contains many rows and 28 columns.
I need unique combinations of the subject ID
and coc#
columns, and the data that might be removed placed into extra columns. I might not be explaining this very well so I will show my example:
ID DOB address name date seen txdone coc#
1 1/08/1997 4blelan bob sager 19/05/2002 1125 45555
1 1/08/1997 4blelan bob sager 19/05/2002 1200 45555
1 1/08/1997 4blelan bob sager 20/06/2003 2000 46666
1 1/08/1997 4blelan bob sager 20/06/2003 1222 46666
2 5/09/1956 55lala Jim reads 19/05/2002 1125 55544
2 5/09/1956 55lala Jim reads 19/05/2002 1111 55544
2 5/09/1956 55lala Jim reads 1/06/2002 1111 55544
2 5/09/1956 55lala Jim reads 2/07/2002 1353 56678
Transformed into this
ID DOB address name dateseen1 txdone1 coc#1 dateseen2 txdone2 coc#2 date seen3 txdone3 coc#3
1 1/08/1997 4blelan bob sager 19/05/2002 1125 45555 19/05/2002 1200 45555
1 1/08/1997 4blelan bob sager 20/06/2003 2000 46666 20/06/2003 1222 46666
2 5/09/1956 55lala Jim reads 19/05/2002 1125 55544 19/05/2002 1111 55544 1/06/2002 1111 55544
2 5/09/1956 55lala Jim reads 2/07/2002 1353 56678
The reason for this is so I can search for 1125
in txdone
but also get the other work that was carried out in that COC
in one line. Looking at it now, I wouldn't even need multiple columns of coc
just the one -- but you get the idea (maybe).
I am very open to doing things differently if I am going about this backwards. However, I am limited to using R and Excel.
In R, the package reshape2
should do the job. Try
require(reshape2)
melt(your_data_frame, id.vars=c("ID", "DOB", "address", "name"))
(You can play around with id.vars
and measure.vars
to get the exact reshaping you want.)
You will need something to make a unique "id" for each row. Here's a solution:
library(splitstackshape) ## For `getanID()`
library(reshape2) ## For `melt()` and `dcast()`
idvars <- c("ID", "DOB", "address", "name", "coc")
mydf2 <- getanID(mydf, idvars)
dfL <- melt(mydf2, id.vars=c(idvars, ".id"))
dcast(dfL, ID + DOB + address + name + coc ~ variable + .id)
# ID DOB address name coc date.seen_1 date.seen_2 date.seen_3 txdone_1 txdone_2 txdone_3
# 1 1 1/08/1997 4blelan bob sager 45555 19/05/2002 19/05/2002 <NA> 1125 1200 <NA>
# 2 1 1/08/1997 4blelan bob sager 46666 20/06/2003 20/06/2003 <NA> 2000 1222 <NA>
# 3 2 5/09/1956 55lala Jim reads 55544 19/05/2002 19/05/2002 1/06/2002 1125 1111 1111
# 4 2 5/09/1956 55lala Jim reads 56678 2/07/2002 <NA> <NA> 1353 <NA> <NA>
You can rearrange the column orders later if you need to.
Alternatively, without melt
ing to a long format first, after you create "mydf2", use reshape()
from base R (and as a bonus, the columns are in the order you want).
reshape(mydf2, direction = "wide", idvar=idvars, timevar=".id")
# ID DOB address name coc date.seen.1 txdone.1 date.seen.2 txdone.2 date.seen.3 txdone.3
# 1 1 1/08/1997 4blelan bob sager 45555 19/05/2002 1125 19/05/2002 1200 <NA> NA
# 3 1 1/08/1997 4blelan bob sager 46666 20/06/2003 2000 20/06/2003 1222 <NA> NA
# 5 2 5/09/1956 55lala Jim reads 55544 19/05/2002 1125 19/05/2002 1111 1/06/2002 1111
# 8 2 5/09/1956 55lala Jim reads 56678 2/07/2002 1353 <NA> NA <NA> NA
This is based on mydf
being defined as:
mydf <- read.table(text = 'ID DOB address name "date seen" txdone coc
1 1/08/1997 4blelan "bob sager" 19/05/2002 1125 45555
1 1/08/1997 4blelan "bob sager" 19/05/2002 1200 45555
1 1/08/1997 4blelan "bob sager" 20/06/2003 2000 46666
1 1/08/1997 4blelan "bob sager" 20/06/2003 1222 46666
2 5/09/1956 55lala "Jim reads" 19/05/2002 1125 55544
2 5/09/1956 55lala "Jim reads" 19/05/2002 1111 55544
2 5/09/1956 55lala "Jim reads" 1/06/2002 1111 55544
2 5/09/1956 55lala "Jim reads" 2/07/2002 1353 56678', header = TRUE)
If you don't want to install "splitstackshape" just for getanID
(I promise I won't be offended), you can generate your .id
variable manually as follows (which is essentially what getanID
does anyway):
X <- do.call(paste, mydf[idvars])
mydf$.id <- ave(X, X, FUN = seq_along)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.