[英]Apply function to each column of dataframe
I have the following (very large) dataframe: 我有以下(非常大)的数据帧:
id epoch
1 0 1.141194e+12
2 1 1.142163e+12
3 2 1.142627e+12
4 2 1.142627e+12
5 3 1.142665e+12
6 3 1.142665e+12
7 4 1.142823e+12
8 5 1.143230e+12
9 6 1.143235e+12
10 6 1.143235e+12
For every unique ID, I now want to get the difference between its maximum and minimum time (epoch timestamp). 对于每个唯一ID,我现在想要获得其最大和最小时间(纪元时间戳)之间的差异。 There are IDs with many more occurences than in the example above, in case it is relevant.
如果相关,则存在比上述示例中更多出现的ID。 I haven't worked much with R yet and tried the following:
我还没有和R一起工作,并尝试了以下方法:
unique = data.frame(as.numeric(unique(df$id)))
differences = apply(unique, 1, get_duration)
get_duration = function(id) {
maxTime = max(df$epoch[which(df$id == id)])
minTime = min(df$epoch[which(df$id == id)])
return ((maxTime - minTime) / 1000)
}
It works, but is incredibly slow. 它有效,但速度非常慢。 What would be a faster approach?
什么是更快的方法?
A couple of approaches. 几种方法。 In base
R
: 在基地
R
:
tapply(df$epoch,df$id,function(x) (max(x)-min(x))/1000)
With data.table
: 使用
data.table
:
require(data.table)
setDT(df)
df[,list(d=(max(epoch)-min(epoch))/1000),by=id]
This can be done easily in dplyr
这可以在
dplyr
轻松dplyr
require(dplyr)
df %>% group_by(id) %>% summarize(diff=(max(epoch)-min(epoch))/1000)
Use the filter by id just once 只使用ID过滤一次
subset = df$epoch[which(df$id == id)]
maxTime = max(subset)
minTime = min(subset)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.