简体   繁体   中英

Apply function to each column of dataframe

I have the following (very large) dataframe:

     id         epoch
1     0     1.141194e+12
2     1     1.142163e+12
3     2     1.142627e+12
4     2     1.142627e+12
5     3     1.142665e+12
6     3     1.142665e+12
7     4     1.142823e+12
8     5     1.143230e+12
9     6     1.143235e+12
10    6     1.143235e+12

For every unique ID, I now want to get the difference between its maximum and minimum time (epoch timestamp). There are IDs with many more occurences than in the example above, in case it is relevant. I haven't worked much with R yet and tried the following:

unique = data.frame(as.numeric(unique(df$id)))
differences = apply(unique, 1, get_duration)

get_duration = function(id) {
  maxTime = max(df$epoch[which(df$id == id)])
  minTime = min(df$epoch[which(df$id == id)])
  return ((maxTime - minTime) / 1000)
}

It works, but is incredibly slow. What would be a faster approach?

A couple of approaches. In base R :

tapply(df$epoch,df$id,function(x) (max(x)-min(x))/1000)

With data.table :

require(data.table)
setDT(df)
df[,list(d=(max(epoch)-min(epoch))/1000),by=id]

This can be done easily in dplyr

require(dplyr)
df %>% group_by(id) %>% summarize(diff=(max(epoch)-min(epoch))/1000)

Use the filter by id just once

subset = df$epoch[which(df$id == id)]
maxTime = max(subset)
minTime = min(subset)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM