I have a dataframe of student enrollment records (transactions) that are in long format.
Sample:
ID Date Type
123 2/1/14 Entry
123 2/5/14 Exit
123 3/1/14 Entry
123 4/4/14 Exit
234 3/2/14 Entry
234 3/20/14 Exit
234 4/3/14 Entry
And I need to convert to wide format by matching pairs of entry and exit records.
Sample:
ID Entry.Date Exit.Date
123 2/1/14 2/5/14
123 3/1/14 4/4/14
234 3/2/14 3/20/14
234 4/3/14
There's nothing inherent in the data that I can use to key together the starting record with the ending record. It's simply ordered by student and then date. Some records are open ended (no matching exit record).
I'm looking at some of the conversion functions such as reshape but don't know if/how I can use those to convert to wide format and limit it to the date range pair. Would you recommend one of those or should I pursue something less elegant? Thanks!
Here's one way using data.table
. The idea is to group by ID, Type
and add an additional column that identifies the Entry/Exit pairs. This is assuming that the data always has the right Entry/Exit pair adjacent to each other, except where either one is missing.
require(data.table) ## >= 1.9.0
setDT(dat) ## dat is your data. converted to data.table now.
dat[, ID2 := seq_len(.N), by=list(ID, Type)]
# dat
# ID Date Type ID2
# 1: 123 2/1/14 Entry 1
# 2: 123 2/5/14 Exit 1
# 3: 123 3/1/14 Entry 2
# 4: 123 4/4/14 Exit 2
# 5: 234 3/2/14 Entry 1
# 6: 234 3/20/14 Exit 1
# 7: 234 4/3/14 Entry 2
Now cast it to wide format using dcast
. Of course you can also use it from reshape2
. But data.table
has it's own implementation now and is faster, so I'll use it here.
dcast.data.table(dat, ID + ID2 ~ Type, value.var="Date")
# ID ID2 Entry Exit
# 1: 123 1 2/1/14 2/5/14
# 2: 123 2 3/1/14 4/4/14
# 3: 234 1 3/2/14 3/20/14
# 4: 234 2 4/3/14 NA
HTH
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.