[英]Using R to find difference between two visits
我有这样的数据框
vehicleId visitDate taskName
123 1/1/2013 Change Battery
456 1/1/2013 Wiper Blades Changed
123 1/2/2013 Tire Pressure Check
123 1/3/2013 Tire Rotation
456 3/1/2013 Tire Pressure Check
我想做的是
vehicleId visitDate timeBetweenVisits(hrs)
123 1/1/2013 24
123 1/2/2013 672
456 1/1/2013 48
有什么想法可以使用R做到这一点吗?
加载和转换数据:
## data now comma-separated as you have fields containing whitespace
R> res <- read.csv(text="
vehicleId, visitDate, taskName
123, 1/1/2013, Change Battery
456, 1/1/2013, Wiper Blades Changed
123, 1/2/2013, Tire Pressure Check
123, 1/3/2013, Tire Rotation
456, 3/1/2013, Tire Pressure Check", stringsAsFactors=FALSE)
R> res$visitDate <- as.Date(res$visitDate, "%m/%d/%Y") ## now in Daye format
看它:
R> res
vehicleId visitDate taskName
1 123 2013-01-01 Change Battery
2 456 2013-01-01 Wiper Blades Changed
3 123 2013-01-02 Tire Pressure Check
4 123 2013-01-03 Tire Rotation
5 456 2013-03-01 Tire Pressure Check
R>
日期计算:
R> res[3,"visitDate"] - res[1,"visitDate"]
Time difference of 1 days
R> as.numeric(res[3,"visitDate"] - res[1,"visitDate"])
[1] 1
R> difftime(res[3,"visitDate"],res[1,"visitDate"], unit="hours")
Time difference of 24 hours
R> as.numeric(difftime(res[3,"visitDate"],res[1,"visitDate"], unit="hours"))
[1] 24
R>
矢量化:
R> as.numeric(difftime(res[2:nrow(res),"visitDate"],
+ res[1:(nrow(res)-1),"visitDate"], unit="hours"))
[1] 0 24 24 1368
R>
您当然也可以将其分配给新列。 您可能还想通过车辆ID子集。
使用@Dirk的答案中的res
,这是一个可以完成工作的by
表达式:
by(res, res$vehicleId, FUN=function(d)
{
data.frame(vehicleId=head(d$vehicleId, -1),
visitDate=head(d$visitDate, -1),
tbv=diff(d$visitDate))
}
)
## res$vehicleId: 123
## vehicleId visitDate tbv
## 1 123 2013-01-01 1 days
## 2 123 2013-01-02 1 days
## ----------------------------------------------------------------------------------------------
## res$vehicleId: 456
## vehicleId visitDate tbv
## 1 456 2013-01-01 59 days
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.