Apply function to each column of dataframe

Question

I have the following (very large) dataframe:

     id         epoch
1     0     1.141194e+12
2     1     1.142163e+12
3     2     1.142627e+12
4     2     1.142627e+12
5     3     1.142665e+12
6     3     1.142665e+12
7     4     1.142823e+12
8     5     1.143230e+12
9     6     1.143235e+12
10    6     1.143235e+12

For every unique ID, I now want to get the difference between its maximum and minimum time (epoch timestamp). There are IDs with many more occurences than in the example above, in case it is relevant. I haven't worked much with R yet and tried the following:

unique = data.frame(as.numeric(unique(df$id)))
differences = apply(unique, 1, get_duration)

get_duration = function(id) {
  maxTime = max(df$epoch[which(df$id == id)])
  minTime = min(df$epoch[which(df$id == id)])
  return ((maxTime - minTime) / 1000)
}

It works, but is incredibly slow. What would be a faster approach?

Answer 1

A couple of approaches. In base R :

tapply(df$epoch,df$id,function(x) (max(x)-min(x))/1000)

With data.table :

require(data.table)
setDT(df)
df[,list(d=(max(epoch)-min(epoch))/1000),by=id]

Answer 2

This can be done easily in dplyr

require(dplyr)
df %>% group_by(id) %>% summarize(diff=(max(epoch)-min(epoch))/1000)

Answer 3

Use the filter by id just once

subset = df$epoch[which(df$id == id)]
maxTime = max(subset)
minTime = min(subset)

Apply function to each column of dataframe

Question

3 answers

solution1
3 ACCPTED 2015-12-15 21:50:39

solution2
1 2015-12-15 21:54:06

solution3
-1 2015-12-15 21:42:34

Apply function to each column of dataframe

Question

3 answers

solution1 3 ACCPTED 2015-12-15 21:50:39

solution2 1 2015-12-15 21:54:06

solution3 -1 2015-12-15 21:42:34

solution1
3 ACCPTED 2015-12-15 21:50:39

solution2
1 2015-12-15 21:54:06

solution3
-1 2015-12-15 21:42:34