I'm unsure of better terminology for my question, so forgive me for the long winded approach.
I'm trying to use two identifying variables, id
and duration
to fill up the rows of a matrix where the columns denote half hour periods (so there should be 6 for a 3 hour period) and the rows are a given person's activities in those time periods. If the activities do not fill up the matrix, a dummy variable should be used instead. I've written an example below which should help clarify.
Example: data has 3 columns, id
, activity
, and duration
. id and duration should serve as identifying variables and activity
should serve as the variable in the matrix.
data <- data.frame(id = c(1, 1, 1, 2, 2, 3, 3, 3),
activity = c("a", "b", "c", "d", "e", "b", "b", "a"),
duration = c(60, 30, 90, 45, 30, 15, 60, 100))
For the example, I used a 3-hour duration hence the 6 columns in the matrix. The matrix below is the wanted output. There are DUMMY
instances where the total duration of a person's activities does not sum to the duration of the matrix. In this example, the total duration is 180 (3 hours * 60), so person 2 who's activity duration sums to 75 (45 + 30) will get the DUMMY
variable after the activities for the first 75 minutes are done.
mat <- t(matrix(c("a", "a", "b", "c", "c", "c",
"d", "d", "e", "DUMMY", "DUMMY", "DUMMY",
"b", "b", "b", "a", "a", "a"),
nrow = 6, ncol = 3))
colnames(mat) <- c("0", "30", "60", "90", "120", "150")
I'm unsure how to fill the matrix mat
above with the data above. Any help would be appreciated. Please let me know if the question needs to be made clearer.
EDIT: edited output
EDIT2: Added matrix column names
EDIT3: Added info on dummy variable
EDIT4: Would it be easier if I added start and end time instead of duration?
An approach would be to locate the activities for every 30-min interval by "id":
ints = seq(0, by = 30, length.out = 6)
data2 = do.call(rbind,
lapply(split(data, data$id),
function(d) {
dur = d$duration
i = findInterval(ints, c(cumsum(c(0, dur[-length(dur)])), sum(dur)))
data.frame(id = d$id[1], ints = ints, activity = d$activity[i])
}))
And on the new "data.frame":
tapply(as.character(data2$activity), data2[c("id", "ints")], identity)
# ints
#id 0 30 60 90 120 150
# 1 "a" "a" "b" "c" "c" "c"
# 2 "d" "d" "e" NA NA NA
# 3 "b" "b" "b" "a" "a" "a"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.