简体   繁体   English

试图循环通过 dataframe

[英]Trying to loop through a dataframe

I am trying to calculate the total activity driver using GPS data.我正在尝试使用 GPS 数据计算总活动驱动程序。 I've written a loop that is intended to calculate the difference in time between two consecutive points in a dataframe over the range of values, summing it as it goes.我编写了一个循环,旨在计算 dataframe 中两个连续点在值范围内的时间差,并将其相加。

However, the final output is much smaller than would be expected, in the order of seconds instead of hundreds of hours, which leads me to believe that it is only looping a few times or not summing the values correctly.然而,最终的 output 比预期的要小得多,以秒为单位而不是数百小时,这让我相信它只是循环了几次或没有正确地对这些值求和。 My programming knowledge is mostly from Python, am I implementing this idea correctly in R or could I write it better?我的编程知识主要来自 Python,我是否在 R 中正确实现了这个想法,或者我可以写得更好吗? My data looks something like this:我的数据看起来像这样:

DriveNo       Date.and.Time Latitude Longitude
1     264 2014-02-01 12:12:05 41.91605  12.37186
2     264 2014-02-01 12:12:05 41.91605  12.37186
3     264 2014-02-01 12:12:12 41.91607  12.37221
4     264 2014-02-01 12:12:27 41.91619  12.37365
5     264 2014-02-01 12:12:42 41.91627  12.37490
6     264 2014-02-01 12:12:57 41.91669  12.37610

Is there a way I can save the result of each iteration to a list so that I could analyse where in the range of values a problem might be occurring?有没有办法可以将每次迭代的结果保存到一个列表中,以便我可以分析值范围内可能出现问题的位置?

datelist = taxi_264$Date.and.Time
dlstandard = as.POSIXlt(datelist)
diffsum = 0
for (i in range(1:83193))
{
  diff = difftime(dlstandard[i], dlstandard[(i+1)], units = "secs")
  diffsum = diffsum + diff
}

You could avoid the loop by using the lead() function from dplyr :您可以通过使用 dplyr 中的lead() dplyr来避免循环:

library(dplyr)

diff <- difftime(dlstandard, lead(dlstandard, 1, defaultValue=dlstandard), units="secs")
diffsum <- sum(diff)

Note that the above is a vectorized way of solving your problem, and is usually the way to go when using R.请注意,以上是解决问题的矢量化方法,通常是使用 R 时的 go 方法。

You can try:你可以试试:

diffsum <- as.numeric(sum(difftime(tail(dlstandard, -1), 
                                   head(dlstandard, -1), units = 'secs')))

This will give diffsum as sum of the differences in seconds.这将以秒为单位给出diffsum总和。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM