I have the following data prepared
Timestamp Weighted Value SumVal Group
1 1600 800 1
2 1000 1000 2
3 1000 1000 2
4 1000 1000 2
5 800 500 3
6 400 500 3
7 2000 800 4
8 1200 1000 4
I want to calculate for each group sum(Weighted_Value)/sum(SumVal), so for example for Group 3 the result would be 1.2.
I was using period.apply to do that:
period.apply(x4, intervalIndex, function(z) sum(z[,4])/sum(z[,2]))
But it's too slow for my application, so I wanted to ask if someone knows a faster alternative for that? I alsready tried ave, but it seems to be even slower.
My goal is btw. to calculate a time-weighted-average, to transfer an irregular time series into a time series with equi-distant-time intervals.
Thanks!
library(data.table)
setDT(df)[, sum(Weighted_Value) / sum(SumVal), by = Group]
but I don't see the time series you are referring to. check out library(zoo) for that.
Using rowsum
seems to be faster (at least for this small example dataset) than the data.table
approach:
sgibb <- function(datframe) {
data.frame(Group = unique(df$Group),
Avg = rowsum(df$Weighted_Value, df$Group)/rowsum(df$SumVal, df$Group))
}
Adding the rowsum
approach to @platfort's benchmark:
library(microbenchmark)
library(dplyr)
library(data.table)
microbenchmark(
Nader = df %>%
group_by(Group) %>%
summarise(res = sum(Weighted_Value) / sum(SumVal)),
Henk = setDT(df)[, sum(Weighted_Value) / sum(SumVal), by = Group],
plafort = weight.avg(df),
sgibb = sgibb(df)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# Nader 2179.890 2280.462 2583.8798 2399.0885 2497.6000 6647.236 100
# Henk 648.191 693.519 788.1421 726.0940 751.0810 2386.260 100
# plafort 2638.967 2740.541 2935.4756 2785.7425 2909.4640 5000.652 100
# sgibb 347.125 384.830 442.6447 409.2815 441.8935 2039.563 100
Try using dplyr
it should be faster than base R
library(dplyr)
df <- read.table(text = "Timestamp Weighted_Value SumVal Group
1 1600 800 1
2 1000 1000 2
3 1000 1000 2
4 1000 1000 2
5 800 500 3
6 400 500 3
7 2000 800 4
8 1200 1000 4" , header = T)
df %>%
group_by(Group) %>%
summarise(res = sum(Weighted_Value) / sum(SumVal))
Here's a base R solution. It's not the fastest for larger (500k+) datasets, but so you can see what may be happening "under the hood" in the other functions.
weight.avg <- function(datframe) {
s <- split(datframe, datframe$Group)
avg <- sapply(s, function(x) sum(x[ ,2]) / sum(x[ ,3]))
data.frame(Group = names(avg), Avg = avg)
}
weight.avg(df)
Group Avg
1 1 2.000000
2 2 1.000000
3 3 1.200000
4 4 1.777778
The first line of the function splits the data frame by Group. The second applies the formula to each Group. The last creates a new data frame.
df <- read.table(text = "Timestamp Weighted_Value SumVal Group
1 1600 800 1
2 1000 1000 2
3 1000 1000 2
4 1000 1000 2
5 800 500 3
6 400 500 3
7 2000 800 4
8 1200 1000 4" , header = T)
library(microbenchmark)
library(dplyr)
library(data.table)
microbenchmark(
Nader = df %>%
group_by(Group) %>%
summarise(res = sum(Weighted_Value) / sum(SumVal)),
Henk = setDT(df)[, sum(Weighted_Value) / sum(SumVal), by = Group],
plafort = weight.avg(df)
)
Unit: microseconds
expr min lq mean median uq max
Nader 2619.174 2827.0100 3094.5570 2949.976 3107.481 7980.684
Henk 783.186 833.7155 932.5883 888.783 944.640 3275.646
plafort 3550.787 3772.4395 4085.2323 3853.561 3995.869 7595.801
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.