在R中使用plyr獲取累積計數

Question

我有一個約有70,000行的數據框，並且我試圖獲取一個依賴於日期時間變量的計數>我一直在使用plyr進行其他分析，但是這種方法無法正常工作。 我的數據幀如下：

Create.Date.Time        Service         Closing.Date.Time
1   2013-06-01 12:59:00 AV              2013-06-01 13:59:00
2   2013-06-02 07:56:00 SERVICE684793   2013-06-02 08:59:00
3   2013-06-02 09:39:00 SERVICE684793   2013-06-03 12:01:00
4   2013-06-02 14:14:00 SERVICE684796   2013-06-02 14:55:00
5   2013-06-02 17:20:00 SERVICE684797   2013-06-03 12:06:00
6   2013-06-03 07:20:00 SERVICE684793   2013-06-03 07:39:00
7   2013-06-03 08:02:00 SERVICE684839   2013-06-03 12:09:00
8   2013-06-03 08:04:00 SERVICE684841   2013-06-04 08:05:00
9   2013-06-03 08:04:00 SERVICE684841   2013-06-05 08:06:00
10  2013-06-03 08:08:00 SERVICE684841   2013-06-03 08:08:00

我的目的是獲取每個Create.Date.Time已關閉的每個觀察的數量。 我不想使用for循環，因為這將永遠花費。 我想使用plyr，該函數很重要：

計算觀察點數

Closing.Date.Time <=創建。日期。時間

每個Service.每個Create.Date.Time Service.

我的起點是ddply (df, .(Service, Create.Date.Time), ...) ，但是我在函數上遇到麻煩，因為值取決於我的Create.Date.Time而且我不知道如何寫下來。 有人可以幫我嗎？

我想結束一個像這樣的數據框：

 Service        Create.Date.Time      Num.Closed
  AV            2013-06-01 12:59:00      0
  SERVICE684793 2013-06-02 07:56:00      0
  SERVICE684793 2013-06-02 09:39:00      1
  SERVICE684793 2013-06-03 07:20:00      1
  SERVICE684796 2013-06-02 14:14:00      0
  SERVICE684797 2013-06-02 17:20:00      0
  SERVICE684839 2013-06-03 08:02:00      0
  SERVICE684841 2013-06-03 08:04:00      0
  SERVICE684841 2013-06-03 08:04:00      0
  SERVICE684841 2013-06-03 08:08:00      3

Answer 1

我不太確定您要最終獲得的data.frame與自結果以來您提出的問題之間的關系。 不是您所描述的。 如果沒有其他選擇，您是否可以編寫將要使用的循環？

如果您想要（如您所寫的那樣）：

計算觀察點數

Closing.Date.Time <= Create.Date.Time

對於每個Service每個Create.Date.Time ，一個好的方法就是使用data.table包。 在這種情況下，您的數據是：

       Create.Date.Time       Service   Closing.Date.Time
 1: 2013-06-01 12:59:00            AV 2013-06-01 13:59:00
 2: 2013-06-02 07:56:00 SERVICE684793 2013-06-02 08:59:00
 3: 2013-06-02 09:39:00 SERVICE684793 2013-06-03 12:01:00
 4: 2013-06-02 14:14:00 SERVICE684796 2013-06-02 14:55:00
 5: 2013-06-02 17:20:00 SERVICE684797 2013-06-03 12:06:00
 6: 2013-06-03 07:20:00 SERVICE684793 2013-06-03 07:39:00
 7: 2013-06-03 08:02:00 SERVICE684839 2013-06-03 12:09:00
 8: 2013-06-03 08:04:00 SERVICE684841 2013-06-04 08:05:00
 9: 2013-06-03 08:04:00 SERVICE684841 2013-06-05 08:06:00
10: 2013-06-03 08:08:00 SERVICE684841 2013-06-03 08:08:00

日期和時間為POSIXct格式。

然后：

dt[, sum(Closing.Date.Time <= Create.Date.Time ), by = c('Service', 'Create.Date.Time')]

會導致

         Service    Create.Date.Time V1
1:            AV 2013-06-01 12:59:00  0
2: SERVICE684793 2013-06-02 07:56:00  0
3: SERVICE684793 2013-06-02 09:39:00  0
4: SERVICE684796 2013-06-02 14:14:00  0
5: SERVICE684797 2013-06-02 17:20:00  0
6: SERVICE684793 2013-06-03 07:20:00  0
7: SERVICE684839 2013-06-03 08:02:00  0
8: SERVICE684841 2013-06-03 08:04:00  0
9: SERVICE684841 2013-06-03 08:08:00  1

這就是你所描述的。

干杯。

Answer 2

我沒有完全理解該問題，因為在某些情況下，顯示的預期輸出與我得到的輸出不同。 如果那只是一個錯字：

數據

 df <-   structure(list(Create.Date.Time = structure(c(1370105940, 1370174160, 
 1370180340, 1370196840, 1370208000, 1370258400, 1370260920, 1370261040, 
 1370261040, 1370261280), class = c("POSIXct", "POSIXt"), tzone = ""), 
 Service = c("AV", "SERVICE684793", "SERVICE684793", "SERVICE684796", 
"SERVICE684797", "SERVICE684793", "SERVICE684839", "SERVICE684841", 
"SERVICE684841", "SERVICE684841"), Closing.Date.Time = structure(c(1370109540, 
1370177940, 1370275260, 1370199300, 1370275560, 1370259540, 
1370275740, 1370347500, 1370433960, 1370261280), class = c("POSIXct", 
"POSIXt"), tzone = "")), .Names = c("Create.Date.Time", "Service", 
"Closing.Date.Time"), row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10"), class = "data.frame")

從POSIXct類中提取時間

library(lubridate)

dfNew <- within(df, {
            Createtime <- period_to_seconds(hms(strftime(Create.Date.Time, "%H:%M:%S")))
         Closingtime <- period_to_seconds(hms(strftime(Closing.Date.Time, "%H:%M:%S")))})

dfNew <- dfNew[order(dfNew$Service),] #not that necessary

使用data.table

library(data.table)
setDT(dfNew)[,Num.Closed := cumsum(unlist(lapply(1:.N, function(i) sum(Closingtime[1:i] <=Createtime[i])))),
   by=Service][,c(2,1,6), with=FALSE] 
#              Service    Create.Date.Time Num.Closed
 #1:            AV 2013-06-01 12:59:00          0
 #2: SERVICE684793 2013-06-02 07:56:00          0
 #3: SERVICE684793 2013-06-02 09:39:00          1
 #4: SERVICE684793 2013-06-03 07:20:00          1
 #5: SERVICE684796 2013-06-02 14:14:00          0
 #6: SERVICE684797 2013-06-02 17:20:00          1
 #7: SERVICE684839 2013-06-03 08:02:00          0
 #8: SERVICE684841 2013-06-03 08:04:00          0
 #9: SERVICE684841 2013-06-03 08:04:00          0
#10: SERVICE684841 2013-06-03 08:08:00          3

在R中使用plyr獲取累積計數

問題描述

2 個解決方案

解決方案1
0 2014-08-22 14:55:12

解決方案2
0 2014-08-22 16:17:22

數據

在R中使用plyr獲取累積計數

問題描述

2 個解決方案

解決方案1 0 2014-08-22 14:55:12

解決方案2 0 2014-08-22 16:17:22

數據

解決方案1
0 2014-08-22 14:55:12

解決方案2
0 2014-08-22 16:17:22