简体   繁体   English

将函数应用于每列数据框

[英]Apply function to each column of dataframe

I have the following (very large) dataframe: 我有以下(非常大)的数据帧:

     id         epoch
1     0     1.141194e+12
2     1     1.142163e+12
3     2     1.142627e+12
4     2     1.142627e+12
5     3     1.142665e+12
6     3     1.142665e+12
7     4     1.142823e+12
8     5     1.143230e+12
9     6     1.143235e+12
10    6     1.143235e+12

For every unique ID, I now want to get the difference between its maximum and minimum time (epoch timestamp). 对于每个唯一ID,我现在想要获得其最大和最小时间(纪元时间戳)之间的差异。 There are IDs with many more occurences than in the example above, in case it is relevant. 如果相关,则存在比上述示例中更多出现的ID。 I haven't worked much with R yet and tried the following: 我还没有和R一起工作,并尝试了以下方法:

unique = data.frame(as.numeric(unique(df$id)))
differences = apply(unique, 1, get_duration)

get_duration = function(id) {
  maxTime = max(df$epoch[which(df$id == id)])
  minTime = min(df$epoch[which(df$id == id)])
  return ((maxTime - minTime) / 1000)
}

It works, but is incredibly slow. 它有效,但速度非常慢。 What would be a faster approach? 什么是更快的方法?

A couple of approaches. 几种方法。 In base R : 在基地R

tapply(df$epoch,df$id,function(x) (max(x)-min(x))/1000)

With data.table : 使用data.table

require(data.table)
setDT(df)
df[,list(d=(max(epoch)-min(epoch))/1000),by=id]

This can be done easily in dplyr 这可以在dplyr轻松dplyr

require(dplyr)
df %>% group_by(id) %>% summarize(diff=(max(epoch)-min(epoch))/1000)

Use the filter by id just once 只使用ID过滤一次

subset = df$epoch[which(df$id == id)]
maxTime = max(subset)
minTime = min(subset)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM