[英]Getting cummulative count with plyr in R
我有一個約有70,000行的數據框,並且我試圖獲取一個依賴於日期時間變量的計數>我一直在使用plyr進行其他分析,但是這種方法無法正常工作。 我的數據幀如下:
Create.Date.Time Service Closing.Date.Time
1 2013-06-01 12:59:00 AV 2013-06-01 13:59:00
2 2013-06-02 07:56:00 SERVICE684793 2013-06-02 08:59:00
3 2013-06-02 09:39:00 SERVICE684793 2013-06-03 12:01:00
4 2013-06-02 14:14:00 SERVICE684796 2013-06-02 14:55:00
5 2013-06-02 17:20:00 SERVICE684797 2013-06-03 12:06:00
6 2013-06-03 07:20:00 SERVICE684793 2013-06-03 07:39:00
7 2013-06-03 08:02:00 SERVICE684839 2013-06-03 12:09:00
8 2013-06-03 08:04:00 SERVICE684841 2013-06-04 08:05:00
9 2013-06-03 08:04:00 SERVICE684841 2013-06-05 08:06:00
10 2013-06-03 08:08:00 SERVICE684841 2013-06-03 08:08:00
我的目的是獲取每個Create.Date.Time已關閉的每個觀察的數量。 我不想使用for循環,因為這將永遠花費。 我想使用plyr,該函數很重要:
計算觀察點數
Closing.Date.Time <=創建。日期。時間
每個Service.
每個Create.Date.Time
Service.
我的起點是ddply (df, .(Service, Create.Date.Time), ...)
,但是我在函數上遇到麻煩,因為值取決於我的Create.Date.Time
而且我不知道如何寫下來。 有人可以幫我嗎?
我想結束一個像這樣的數據框:
Service Create.Date.Time Num.Closed
AV 2013-06-01 12:59:00 0
SERVICE684793 2013-06-02 07:56:00 0
SERVICE684793 2013-06-02 09:39:00 1
SERVICE684793 2013-06-03 07:20:00 1
SERVICE684796 2013-06-02 14:14:00 0
SERVICE684797 2013-06-02 17:20:00 0
SERVICE684839 2013-06-03 08:02:00 0
SERVICE684841 2013-06-03 08:04:00 0
SERVICE684841 2013-06-03 08:04:00 0
SERVICE684841 2013-06-03 08:08:00 3
我不太確定您要最終獲得的data.frame與自結果以來您提出的問題之間的關系。 不是您所描述的。 如果沒有其他選擇,您是否可以編寫將要使用的循環?
如果您想要(如您所寫的那樣):
計算觀察點數
Closing.Date.Time <= Create.Date.Time
對於每個Service
每個Create.Date.Time
,一個好的方法就是使用data.table
包。 在這種情況下,您的數據是:
Create.Date.Time Service Closing.Date.Time
1: 2013-06-01 12:59:00 AV 2013-06-01 13:59:00
2: 2013-06-02 07:56:00 SERVICE684793 2013-06-02 08:59:00
3: 2013-06-02 09:39:00 SERVICE684793 2013-06-03 12:01:00
4: 2013-06-02 14:14:00 SERVICE684796 2013-06-02 14:55:00
5: 2013-06-02 17:20:00 SERVICE684797 2013-06-03 12:06:00
6: 2013-06-03 07:20:00 SERVICE684793 2013-06-03 07:39:00
7: 2013-06-03 08:02:00 SERVICE684839 2013-06-03 12:09:00
8: 2013-06-03 08:04:00 SERVICE684841 2013-06-04 08:05:00
9: 2013-06-03 08:04:00 SERVICE684841 2013-06-05 08:06:00
10: 2013-06-03 08:08:00 SERVICE684841 2013-06-03 08:08:00
日期和時間為POSIXct
格式。
然后:
dt[, sum(Closing.Date.Time <= Create.Date.Time ), by = c('Service', 'Create.Date.Time')]
會導致
Service Create.Date.Time V1
1: AV 2013-06-01 12:59:00 0
2: SERVICE684793 2013-06-02 07:56:00 0
3: SERVICE684793 2013-06-02 09:39:00 0
4: SERVICE684796 2013-06-02 14:14:00 0
5: SERVICE684797 2013-06-02 17:20:00 0
6: SERVICE684793 2013-06-03 07:20:00 0
7: SERVICE684839 2013-06-03 08:02:00 0
8: SERVICE684841 2013-06-03 08:04:00 0
9: SERVICE684841 2013-06-03 08:08:00 1
這就是你所描述的。
干杯。
我沒有完全理解該問題,因為在某些情況下,顯示的預期輸出與我得到的輸出不同。 如果那只是一個錯字:
df <- structure(list(Create.Date.Time = structure(c(1370105940, 1370174160,
1370180340, 1370196840, 1370208000, 1370258400, 1370260920, 1370261040,
1370261040, 1370261280), class = c("POSIXct", "POSIXt"), tzone = ""),
Service = c("AV", "SERVICE684793", "SERVICE684793", "SERVICE684796",
"SERVICE684797", "SERVICE684793", "SERVICE684839", "SERVICE684841",
"SERVICE684841", "SERVICE684841"), Closing.Date.Time = structure(c(1370109540,
1370177940, 1370275260, 1370199300, 1370275560, 1370259540,
1370275740, 1370347500, 1370433960, 1370261280), class = c("POSIXct",
"POSIXt"), tzone = "")), .Names = c("Create.Date.Time", "Service",
"Closing.Date.Time"), row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10"), class = "data.frame")
從POSIXct
類中提取時間
library(lubridate)
dfNew <- within(df, {
Createtime <- period_to_seconds(hms(strftime(Create.Date.Time, "%H:%M:%S")))
Closingtime <- period_to_seconds(hms(strftime(Closing.Date.Time, "%H:%M:%S")))})
dfNew <- dfNew[order(dfNew$Service),] #not that necessary
使用data.table
library(data.table)
setDT(dfNew)[,Num.Closed := cumsum(unlist(lapply(1:.N, function(i) sum(Closingtime[1:i] <=Createtime[i])))),
by=Service][,c(2,1,6), with=FALSE]
# Service Create.Date.Time Num.Closed
#1: AV 2013-06-01 12:59:00 0
#2: SERVICE684793 2013-06-02 07:56:00 0
#3: SERVICE684793 2013-06-02 09:39:00 1
#4: SERVICE684793 2013-06-03 07:20:00 1
#5: SERVICE684796 2013-06-02 14:14:00 0
#6: SERVICE684797 2013-06-02 17:20:00 1
#7: SERVICE684839 2013-06-03 08:02:00 0
#8: SERVICE684841 2013-06-03 08:04:00 0
#9: SERVICE684841 2013-06-03 08:04:00 0
#10: SERVICE684841 2013-06-03 08:08:00 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.