簡體   English   中英

從開始和結束時間之間的間隔計算的R組變量

[英]R group variables in days calculated from interval between start and end time

我有一個數據框如下

tmpdf <- data.frame(licensePlate = c("Y80901", "Y80901", "Y80901", "AMG-999", "AMG-999", "W3188", "W3188"),  
starttime= c("2015-09-18 09:55", "2015-09-18 23:00", "2015-09-20 15:00", "2015-09-17 15:42", "2015-09-21 09:22", "2015-09-17 09:00", "2015-09-21 14:00"),
endtime = c("2015-09-18 17:55", "2015-09-20 11:00", "2015-09-21 12:00",  "2015-09-18 13:00",  "2015-09-21 14:22", "2015-09-21 12:00", "2015-09-21 16:00"))
    tmpdf
      licensePlate        starttime          endtime
    1       Y80901 2015-09-18 09:55 2015-09-18 17:55
    2       Y80901 2015-09-18 23:00 2015-09-20 11:00
    3       Y80901 2015-09-20 15:00 2015-09-21 12:00
    4      AMG-999 2015-09-17 15:42 2015-09-18 13:00
    5      AMG-999 2015-09-21 09:22 2015-09-21 14:22
    6        W3188 2015-09-17 09:00 2015-09-21 12:00
    7        W3188 2015-09-21 14:00 2015-09-21 16:00

我想計算每個licensePlate每天使用的最后n天(例如,從9月17日到9月21日的最后5天),我的預期結果如下:

   Period            LicensePlate        Used Time   

1 2015-09-17         Y80901              0
2 2015-09-17         AMG-999             8.3     
3 2015-09-17         W3188               15
4 2015-09-18         Y80901              9
5 2015-09-18         AMG-999             13
6 2015-09-18         W3188               24
7 2015-09-19         Y80901              24
8 2015-09-19         AMG-999             0
9 2015-09-19         W3188               24
10 2015-09-20        Y80901              20
11 2015-09-20        AMG-999             0
12 2015-09-20        W3188               24
13 2015-09-21        Y80901              12
14 2015-09-21        AMG-999             5
15 2015-09-21        W3188               14

我認為dplyr / data.table和lubridate可用於獲取我的結果,我可能需要以天為單位測量時間段,但我不知道如何在開始/結束時間間隔內切換,當開始/結束時不同行。

這是讓你入門的東西。 幾乎是您想要的輸出,因為它沒有顯示每個期間丟失的licensePlate

第一步是將您的日期轉換為有效的POSIXct類,然后將數據擴展到每分鍾級別(可能是此解決方案中成本最高的部分),並按照licensePlatePeriod匯總,同時總結結果(我是這里沒有使用as.Date ,因為它處理的POSIX值非常糟糕,值在00和凌晨1點之間。

library(data.table)
setDT(tmpdf)[, `:=`(starttime = as.POSIXct(starttime), endtime = as.POSIXct(endtime))]
res <- tmpdf[, .(licensePlate, Period = seq(starttime, endtime, by = "1 min")), by = 1:nrow(tmpdf)]
res[, .(Used_Time = round(.N/60L, 1L)), keyby = .(Period = substr(Period, 1L, 10L), licensePlate)]
#         Period licensePlate Used_Time
#  1: 2015-09-17      AMG-999       8.3
#  2: 2015-09-17        W3188      15.0
#  3: 2015-09-18      AMG-999      13.0
#  4: 2015-09-18        W3188      24.0
#  5: 2015-09-18       Y80901       9.0
#  6: 2015-09-19        W3188      24.0
#  7: 2015-09-19       Y80901      24.0
#  8: 2015-09-20        W3188      24.0
#  9: 2015-09-20       Y80901      20.0
# 10: 2015-09-21      AMG-999       5.0
# 11: 2015-09-21        W3188      14.0
# 12: 2015-09-21       Y80901      12.0

深吸一口氣。 這是我的解決方案

初始化數據

tmpdf <- data.frame(licensePlate = c("Y80901", "Y80901", "Y80901", "AMG-999", "AMG-999", "W3188", "W3188"),  
                starttime= c("2015-09-18 09:55", "2015-09-18 23:00", "2015-09-20 15:00", "2015-09-17 15:42", "2015-09-21 09:22", "2015-09-17 09:00", "2015-09-21 14:00"),
                endtime = c("2015-09-18 17:55", "2015-09-20 11:00", "2015-09-21 12:00",  "2015-09-18 13:00",  "2015-09-21 14:22", "2015-09-21 12:00", "2015-09-21 16:00"))

'converting to POSIXct for better date/time handling'
    tmpdf$starttime <- as.POSIXct(tmpdf$starttime, tz = "GMT")
    tmpdf$endtime <- as.POSIXct(tmpdf$endtime, tz = "GMT")

數據准備

要執行所需的操作,必須將完整的使用數據轉換為每日使用數據。 所以我編寫了以下函數來將數據准備為所需的格式。

#splits single usage data into two  
splitToTwo <- function(list){
newList <- NULL

for ( i in 1:nrow(list)){

 tmp <- list[i,]

 # set the end time of the first split as 23:59:59
 list[i,]$endtime <- as.Date(list[i,]$starttime) + hours(23) + minutes(59) + seconds(59)

 # set the start time of the second split as 00:00:01
 tmp$starttime <- list[i,]$endtime + seconds(2)

 # add the new df to the list  
 tmp <-  rbind(tmp,list[i,])
 newList <- rbind(newList,tmp)
 }
 return(newList)
}


#recursive function. Split the usage data into two till there are completely normalised to daily usage data
setDailyUsage <- function(tmpdf){

  # create a exclusive subset where the usage spawns more than a day   
  list <- tmpdf[as.Date(tmpdf$endtime) - as.Date(tmpdf$starttime) > 0,   ]

  # replace tmpdf with usage that started and ended the same day   
  tmpdf <- tmpdf[ as.Date(tmpdf$endtime) - as.Date(tmpdf$starttime) == 0,]  

  # call to our split function to split the dataset with usage spawning more than one day  
  split <- splitToTwo(list)

  # add the now split data to our exclusive
  tmpdf <- rbind(tmpdf,split)

  if (nrow(tmpdf[as.Date(tmpdf$endtime) - as.Date(tmpdf$starttime) > 0,   ])>0){
      tmpdf <- setDailyUsage(tmpdf)
  }

return(tmpdf)

}

准備好的數據

我們准備的數據

preparedData <- setDailyUsage(tmpdf)
    licensePlate           starttime             endtime
1         Y80901 2015-09-18 09:55:00 2015-09-18 17:55:00
5        AMG-999 2015-09-21 09:22:00 2015-09-21 14:22:00
7          W3188 2015-09-21 14:00:00 2015-09-21 16:00:00
21        Y80901 2015-09-18 23:00:00 2015-09-18 23:59:59
3         Y80901 2015-09-21 00:00:01 2015-09-21 12:00:00
31        Y80901 2015-09-20 15:00:00 2015-09-20 23:59:59
4        AMG-999 2015-09-18 00:00:01 2015-09-18 13:00:00
41       AMG-999 2015-09-17 15:42:00 2015-09-17 23:59:59
61         W3188 2015-09-17 09:00:00 2015-09-17 23:59:59
2         Y80901 2015-09-20 00:00:01 2015-09-20 11:00:00
211       Y80901 2015-09-19 00:00:01 2015-09-19 23:59:59
611        W3188 2015-09-18 00:00:01 2015-09-18 23:59:59
612        W3188 2015-09-19 00:00:01 2015-09-19 23:59:59
6          W3188 2015-09-21 00:00:01 2015-09-21 12:00:00
613        W3188 2015-09-20 00:00:01 2015-09-20 23:59:59

數據操作

現在我們創建一個新的DF,它代表所需格式的數據。 這將最初在UsedTime列中具有空值。

preparedData$duration <- preparedData$endtime - preparedData$starttime
noOfUniquePlates <- length(unique(preparedData$licensePlate))
Period <- rep(seq(from=(min(as.Date(preparedData$starttime))),to=(max(as.Date(preparedData$starttime))), by="day"),noOfUniquePlates)
noOfUniqueDays <- length(unique(Period))
LicensePlate <- rep(unique(preparedData$licensePlate),each=noOfUniqueDays)
UsedTime <- 0

newDF <- data.frame(Period,LicensePlate,UsedTime)

現在,newDF的每一行都有一個簡單的mapply函數,在preparedData df中搜索正確的用法數據。

findUsage <- function(p,l){
  sum(preparedData[as.Date(preparedData$starttime) == p & as.Date(preparedData$endtime) == p & preparedData$licensePlate == l ,  ]$duration)
}
newDF$UsedTime <- mapply( findUsage, newDF$Period, newDF$LicensePlate)
newDF$UsedTime <- newDF$UsedTime/60

    > newDF[with(newDF,order(Period)),]
       Period LicensePlate  UsedTime
1  2015-09-17       Y80901  0.000000
6  2015-09-17      AMG-999  8.299722
11 2015-09-17        W3188 14.999722
2  2015-09-18       Y80901  8.999722
7  2015-09-18      AMG-999 12.999722
12 2015-09-18        W3188 23.999444
3  2015-09-19       Y80901 23.999444
8  2015-09-19      AMG-999  0.000000
13 2015-09-19        W3188 23.999444
4  2015-09-20       Y80901 19.999444
9  2015-09-20      AMG-999  0.000000
14 2015-09-20        W3188 23.999444
5  2015-09-21       Y80901 11.999722
10 2015-09-21      AMG-999  5.000000
15 2015-09-21        W3188 13.999722

我不得不限制解釋以保持答案簡短。 如果您在評論中需要任何澄清,請與我們聯系。

你是對的,plyr可以用來解決這個問題。 一種可能的方案:

tmpdf$starttime <- as.POSIXct(tmpdf$starttime) #convert date/time columns to date/time values in R
tmpdf$endtime <- as.POSIXct(tmpdf$endtime) #convert date/time columns to date/time values in R
newdf <- ddply(tmpdf,.(as.Date(starttime),licensePlate),function(df){
df$diffdays <- as.double(difftime(df$endtime,df$starttime,units='days'))
df
})
#If you want to only have the Period, LicensePlate, and Used Time columns remaining:
newdf <- subset(newdf,select=c(1,2,5))
colnames(newdf) <- c('Period','LicensePlate','UsedTime')

希望能幫助到你!

試試這個 - 它有幫助嗎?

`tmpdf <- data.frame(licensePlate = c("Y80901", "Y80901", "Y80901", "AMG-999", "AMG-999", "W3188", "W3188"),  
                    starttime= c("2015-09-18 09:55", "2015-09-18 23:00", "2015-09-20 15:00", "2015-09-17 15:42", "2015-09-21 09:22", "2015-09-17 09:00", "2015-09-21 14:00"),
                    endtime = c("2015-09-18 17:55", "2015-09-20 11:00", "2015-09-21 12:00",  "2015-09-18 13:00",  "2015-09-21 14:22", "2015-09-21 12:00", "2015-09-21 16:00"))

tmpdf
str(tmpdf)
library(lubridate)
tmpdf$starttime=ymd_hm(paste(tmpdf$starttime))
tmpdf$endtime=ymd_hm(paste(tmpdf$endtime))
tmpdf$Period=day(tmpdf$starttime)
tmpdf$diff=difftime(tmpdf$endtime,tmpdf$starttime)
tmpdf`

在此輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM