简体   繁体   English

用data.table计算两个日期之间的小时数

[英]Counting number of hours between two dates with data.table

I have a data table dt_stadium_hours 我有一个数据表dt_stadium_hours

>dt_stadium_hours
   mon_from_time mon_to_time tue_from_time tue_to_time wed_from_time wed_to_time thu_from_time thu_to_time
1:      7.965174    21.39378      7.965174    21.39378      7.965174    21.39378      7.965174    21.39876
   fri_from_time fri_to_time sat_from_time sat_to_time sun_from_time sun_to_time
1:      7.965174    21.39876      7.942786    21.35149      9.766915    16.91617

I have another table which list all the days in which the stadium was closed: dt_stadium_closed 我还有另一个表,列出了体育场关闭的所有日子:dt_stadium_closed

> dt_stadium_closed
    close_date
1:    2017-04-16
2:    2017-04-21
3:    2017-04-22
4:    2017-04-28
5:    2017-05-02 

I have another table dt_player_start and dt_player_stop which tells the first time when the player started playing and when was the last time he played, which looks like, 我还有另一个表dt_player_start和dt_player_stop,它告诉玩家第一次开始玩游戏的时间,以及他最后一次玩游戏的时间,看起来像这样,

> dt_player_start 
   played_date  start_time     day
1:   2017-04-14     1507       Friday

> dt_player_stop
   played_date  stop_time      day
2:   2017-05-05     1842       Friday

I need to calculate the total number of hours for which this particular player played, 我需要计算该特定玩家玩的总时数,

Here he started playing on 2017-04-14 at 1507 hours,as provided in table "dt_player_start", as it is friday so the stadium closes at 21.39876 hours, so he have to leave, the last day in which he played is provided in "dt_player_stop" . 在这里他按照表“ dt_player_start”的规定在2017年4月14日的1507小时开始比赛,因为它是星期五,所以体育场在21.39876小时关闭,所以他必须离开,提供了他比赛的最后一天“ dt_player_stop”。 he stopped playing on 2017-05-05 at 1842 hours. 他在1842小时停止在2017-05-05比赛。

I need to calculate the total number of hours for which the player played the game. 我需要计算玩家玩游戏的总小时数。 the days in which the stadium was closed, provided in table "dt_stadium_closed" should not be counted. 表“ dt_stadium_closed”中提供的体育场关闭的天数不应计算在内。

How to do this with data.table in R 如何在R中使用data.table做到这一点

A possible approach: 可能的方法:

# create data.table with open and close times by day of the week
dt_open <- dcast(melt(dt_stadium_hours,
                      measure.vars = 1:14)[, c('day','from.to') := tstrsplit(sub('_','-',variable,fixed=TRUE), split = '-')
                                           ][, variable := NULL],
                 day ~ from.to)

# create a data.table with all the play dates
DT <- data.table(dates = seq.Date(dt_player_start$played_date, 
                                  dt_player_stop$played_date,
                                  by = 'day'))[!dates %in% dt_stadium_closed$close_date]


# create a day-variable with day-abreviations similar to 'dt_open'
DT[, day := substr(tolower(weekdays(dates)),1,3)]

# join with 'dt_open' on 'day'
DT[dt_open, on = 'day', `:=` (from_time = from_time, to_time = to_time)]

# convert hour-values to data-time values
dcols <- c('from_time','to_time')
DT[, (dcols) := lapply(.SD, function(x) as.POSIXct(as.numeric(dates)*86400 + x*3600, origin = '1970-01-01', tz = 'GMT')), .SDcols = dcols]

# replace the first from-date
DT[dates == dt_player_start$played_date, from_time := as.POSIXct(paste(dt_player_start$played_date,dt_player_start$start_time), '%Y-%m-%d %H%M', tz = 'GMT')]

# replace the last to-date
DT[dates == dt_player_stop$played_date, to_time := as.POSIXct(paste(dt_player_stop$played_date,dt_player_stop$stop_time), '%Y-%m-%d %H%M', tz = 'GMT')]

# calculate hours played by day
DT[, played := to_time - from_time]

This gives the following data.table: 这给出了以下数据表:

 > DT dates day from_time to_time played 1: 2017-04-14 fri 2017-04-14 15:07:00 2017-04-14 21:23:55 6.282093 hours 2: 2017-04-15 sat 2017-04-15 07:56:34 2017-04-15 21:21:05 13.408704 hours 3: 2017-04-17 mon 2017-04-17 07:57:54 2017-04-17 21:23:37 13.428606 hours 4: 2017-04-18 tue 2017-04-18 07:57:54 2017-04-18 21:23:37 13.428606 hours 5: 2017-04-19 wed 2017-04-19 07:57:54 2017-04-19 21:23:37 13.428606 hours 6: 2017-04-20 thu 2017-04-20 07:57:54 2017-04-20 21:23:55 13.433586 hours 7: 2017-04-23 sun 2017-04-23 09:46:00 2017-04-23 16:54:58 7.149255 hours 8: 2017-04-24 mon 2017-04-24 07:57:54 2017-04-24 21:23:37 13.428606 hours 9: 2017-04-25 tue 2017-04-25 07:57:54 2017-04-25 21:23:37 13.428606 hours 10: 2017-04-26 wed 2017-04-26 07:57:54 2017-04-26 21:23:37 13.428606 hours 11: 2017-04-27 thu 2017-04-27 07:57:54 2017-04-27 21:23:55 13.433586 hours 12: 2017-04-29 sat 2017-04-29 07:56:34 2017-04-29 21:21:05 13.408704 hours 13: 2017-04-30 sun 2017-04-30 09:46:00 2017-04-30 16:54:58 7.149255 hours 14: 2017-05-01 mon 2017-05-01 07:57:54 2017-05-01 21:23:37 13.428606 hours 15: 2017-05-03 wed 2017-05-03 07:57:54 2017-05-03 21:23:37 13.428606 hours 16: 2017-05-04 thu 2017-05-04 07:57:54 2017-05-04 21:23:55 13.433586 hours 17: 2017-05-05 fri 2017-05-05 07:57:54 2017-05-05 18:42:00 10.734826 hours 

Now you can get the sum of the played hours: 现在您可以获取播放时间的总和:

 > DT[, sum(played)] Time difference of 205.8624 hours 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM