[英]Counting number of hours between two dates with data.table
I have a data table dt_stadium_hours 我有一个数据表dt_stadium_hours
>dt_stadium_hours
mon_from_time mon_to_time tue_from_time tue_to_time wed_from_time wed_to_time thu_from_time thu_to_time
1: 7.965174 21.39378 7.965174 21.39378 7.965174 21.39378 7.965174 21.39876
fri_from_time fri_to_time sat_from_time sat_to_time sun_from_time sun_to_time
1: 7.965174 21.39876 7.942786 21.35149 9.766915 16.91617
I have another table which list all the days in which the stadium was closed: dt_stadium_closed 我还有另一个表,列出了体育场关闭的所有日子:dt_stadium_closed
> dt_stadium_closed
close_date
1: 2017-04-16
2: 2017-04-21
3: 2017-04-22
4: 2017-04-28
5: 2017-05-02
I have another table dt_player_start and dt_player_stop which tells the first time when the player started playing and when was the last time he played, which looks like, 我还有另一个表dt_player_start和dt_player_stop,它告诉玩家第一次开始玩游戏的时间,以及他最后一次玩游戏的时间,看起来像这样,
> dt_player_start
played_date start_time day
1: 2017-04-14 1507 Friday
> dt_player_stop
played_date stop_time day
2: 2017-05-05 1842 Friday
I need to calculate the total number of hours for which this particular player played, 我需要计算该特定玩家玩的总时数,
Here he started playing on 2017-04-14 at 1507 hours,as provided in table "dt_player_start", as it is friday so the stadium closes at 21.39876 hours, so he have to leave, the last day in which he played is provided in "dt_player_stop" . 在这里他按照表“ dt_player_start”的规定在2017年4月14日的1507小时开始比赛,因为它是星期五,所以体育场在21.39876小时关闭,所以他必须离开,提供了他比赛的最后一天“ dt_player_stop”。 he stopped playing on 2017-05-05 at 1842 hours.
他在1842小时停止在2017-05-05比赛。
I need to calculate the total number of hours for which the player played the game. 我需要计算玩家玩游戏的总小时数。 the days in which the stadium was closed, provided in table "dt_stadium_closed" should not be counted.
表“ dt_stadium_closed”中提供的体育场关闭的天数不应计算在内。
How to do this with data.table in R 如何在R中使用data.table做到这一点
A possible approach: 可能的方法:
# create data.table with open and close times by day of the week
dt_open <- dcast(melt(dt_stadium_hours,
measure.vars = 1:14)[, c('day','from.to') := tstrsplit(sub('_','-',variable,fixed=TRUE), split = '-')
][, variable := NULL],
day ~ from.to)
# create a data.table with all the play dates
DT <- data.table(dates = seq.Date(dt_player_start$played_date,
dt_player_stop$played_date,
by = 'day'))[!dates %in% dt_stadium_closed$close_date]
# create a day-variable with day-abreviations similar to 'dt_open'
DT[, day := substr(tolower(weekdays(dates)),1,3)]
# join with 'dt_open' on 'day'
DT[dt_open, on = 'day', `:=` (from_time = from_time, to_time = to_time)]
# convert hour-values to data-time values
dcols <- c('from_time','to_time')
DT[, (dcols) := lapply(.SD, function(x) as.POSIXct(as.numeric(dates)*86400 + x*3600, origin = '1970-01-01', tz = 'GMT')), .SDcols = dcols]
# replace the first from-date
DT[dates == dt_player_start$played_date, from_time := as.POSIXct(paste(dt_player_start$played_date,dt_player_start$start_time), '%Y-%m-%d %H%M', tz = 'GMT')]
# replace the last to-date
DT[dates == dt_player_stop$played_date, to_time := as.POSIXct(paste(dt_player_stop$played_date,dt_player_stop$stop_time), '%Y-%m-%d %H%M', tz = 'GMT')]
# calculate hours played by day
DT[, played := to_time - from_time]
This gives the following data.table: 这给出了以下数据表:
> DT dates day from_time to_time played 1: 2017-04-14 fri 2017-04-14 15:07:00 2017-04-14 21:23:55 6.282093 hours 2: 2017-04-15 sat 2017-04-15 07:56:34 2017-04-15 21:21:05 13.408704 hours 3: 2017-04-17 mon 2017-04-17 07:57:54 2017-04-17 21:23:37 13.428606 hours 4: 2017-04-18 tue 2017-04-18 07:57:54 2017-04-18 21:23:37 13.428606 hours 5: 2017-04-19 wed 2017-04-19 07:57:54 2017-04-19 21:23:37 13.428606 hours 6: 2017-04-20 thu 2017-04-20 07:57:54 2017-04-20 21:23:55 13.433586 hours 7: 2017-04-23 sun 2017-04-23 09:46:00 2017-04-23 16:54:58 7.149255 hours 8: 2017-04-24 mon 2017-04-24 07:57:54 2017-04-24 21:23:37 13.428606 hours 9: 2017-04-25 tue 2017-04-25 07:57:54 2017-04-25 21:23:37 13.428606 hours 10: 2017-04-26 wed 2017-04-26 07:57:54 2017-04-26 21:23:37 13.428606 hours 11: 2017-04-27 thu 2017-04-27 07:57:54 2017-04-27 21:23:55 13.433586 hours 12: 2017-04-29 sat 2017-04-29 07:56:34 2017-04-29 21:21:05 13.408704 hours 13: 2017-04-30 sun 2017-04-30 09:46:00 2017-04-30 16:54:58 7.149255 hours 14: 2017-05-01 mon 2017-05-01 07:57:54 2017-05-01 21:23:37 13.428606 hours 15: 2017-05-03 wed 2017-05-03 07:57:54 2017-05-03 21:23:37 13.428606 hours 16: 2017-05-04 thu 2017-05-04 07:57:54 2017-05-04 21:23:55 13.433586 hours 17: 2017-05-05 fri 2017-05-05 07:57:54 2017-05-05 18:42:00 10.734826 hours
Now you can get the sum of the played hours: 现在您可以获取播放时间的总和:
> DT[, sum(played)] Time difference of 205.8624 hours
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.