![](/img/trans.png)
[英]How to find the difference in time between two date/time columns from a data frame
[英]How to subset and find time difference between two data points using bike station data
我正在試驗自行車站數據,並有一個for循環,提取從不同站點開始的自行車,然后重新安排停止時間和startime,以顯示操作員的自行車運動(從停止的地方到它開始的地方) ),以及開始和結束之間的difftime
或時間差。
樣本數據
starttime stoptime start.station.id end.station.id bikeid
1 2017-01-16 13:08:18 2017-01-16 13:28:13 3156 466 1
2 2017-01-10 19:10:31 2017-01-10 19:16:02 422 3090 1
3 2017-01-04 08:47:42 2017-01-04 08:57:10 507 442 1
4 2017-01-12 18:08:33 2017-01-12 18:36:09 546 3151 2
5 2017-01-21 09:52:13 2017-01-21 10:21:07 3243 212 2
6 2017-01-26 05:46:18 2017-01-26 05:49:13 470 168 2
我的代碼
raw_data = test
unique_id = unique(raw_data$bikeid)
output1 <- data.frame("bikeid"= integer(0), "end.station.id"= integer(0), "start.station.id" = integer(0), "diff.time" = numeric(0), "stoptime" = character(),"starttime" = character(), stringsAsFactors=FALSE)
for (bikeid in unique_id)
{
onebike <- raw_data[ which(raw_data$bikeid== bikeid), ]
onebike$starttime <- strptime(onebike$starttime, "%Y-%m-%d %H:%M:%S", tz = "EST")
onebike <- onebike[order(onebike$starttime, decreasing = FALSE),]
if(nrow(onebike) >=2 ){
for(i in 2:nrow(onebike )) {
print(onebike)
if(is.integer(onebike[i-1,"end.station.id"]) & is.integer(onebike[i,"start.station.id"]) &
onebike[i-1,"end.station.id"] != onebike[i,"start.station.id"]){
diff_time <- as.double(difftime(strptime(onebike[i,"starttime"], "%Y-%m-%d %H:%M:%S", tz = "EST"),
strptime(onebike[i-1,"stoptime"], "%Y-%m-%d %H:%M:%S", tz = "EST")
,units = "secs"))
new_row <- c(bikeid, onebike[i-1,"end.station.id"], onebike[i,"start.station.id"], diff_time, as.character(onebike[i-1,"stoptime"]), as.character(onebike[i,"starttime"]))
output1[nrow(output1) + 1,] = new_row
}
}
}
}
產量
bikeid end.station.id start.station.id diff.time stoptime starttime
1 1 442 422 555201 2017-01-04 08:57:10 2017-01-10 19:10:31
2 1 3090 3156 496336 2017-01-10 19:16:02 2017-01-16 13:08:18
3 2 3151 3243 746164 2017-01-12 18:36:09 2017-01-21 09:52:13
4 2 212 470 415511 2017-01-21 10:21:07 2017-01-26 05:46:18
5 3 3112 351 1587161 2017-01-12 08:58:42 2017-01-30 17:51:23
但是,在大型數據集上,這個for循環需要很長時間。 有沒有辦法使用dplyr
或data.table
加速這個循環或以避免循環的方式重新排列數據? 不勝感激任何解釋或建議
樣本數據(在輸入中)
structure(list(starttime = structure(c(1484572098, 1484075431,
1483519662, 1484244513, 1484992333, 1485409578, 1484210616, 1483727948,
1485798683), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
stoptime = structure(c(1484573293, 1484075762, 1483520230,
1484246169, 1484994067, 1485409753, 1484211522, 1483729024,
1485799997), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
start.station.id = c(3156L, 422L, 507L, 546L, 3243L, 470L,
439L, 309L, 351L), end.station.id = c(466L, 3090L, 442L,
3151L, 212L, 168L, 3112L, 439L, 433L), bikeid = c(1, 1, 1,
2, 2, 2, 3, 3, 3)), .Names = c("starttime", "stoptime", "start.station.id",
"end.station.id", "bikeid"), row.names = c(NA, -9L), class = "data.frame")
一種方法如下。 我打電話給你的數據foo。 您可能希望通過bikeid
和starttime
開始對數據進行排序。 然后,對於每個bikeid
,您希望使用lead()
創建新列(即next.start.station.id
和next.start.time
lead()
。 您還想使用difftime()
找到時差。 之后,您要刪除end.station.id
和next.start.station.id
具有相同ID的行。 最后,您可以根據需要排列列。
library(dplyr)
foo %>%
arrange(bikeid, starttime) %>% # if necessary, arrange(bikeid, starttime, stoptime)
group_by(bikeid) %>%
mutate(next.start.station.id = lead(start.station.id),
next.start.time = lead(starttime),
diff.time = difftime(next.start.time, stoptime, units = "secs")) %>%
filter(end.station.id != next.start.station.id) %>%
select(bikeid, end.station.id, next.start.station.id, diff.time, stoptime, next.start.time)
bikeid end.station.id next.start.station.id diff.time stoptime next.start.time
<dbl> <int> <int> <time> <dttm> <dttm>
1 1.00 442 422 555201 2017-01-04 08:57:10 2017-01-10 19:10:31
2 1.00 3090 3156 496336 2017-01-10 19:16:02 2017-01-16 13:08:18
3 2.00 3151 3243 746164 2017-01-12 18:36:09 2017-01-21 09:52:13
4 2.00 212 470 415511 2017-01-21 10:21:07 2017-01-26 05:46:18
5 3.00 3112 351 1587161 2017-01-12 08:58:42 2017-01-30 17:51:23
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.