I am trying to get sums in r. I have 2 dataframes. One consists of 3 columns (tag, doy (=day of year) at beginning, doy at end). The other consists of 2 columns (doy, bbb (=an amount per day)).
Now I want for each row of df1 the sum of bbb from doy.0 to doy.end.
# creating df1
tag<-c(1:5)
doy.0<-c(200:204)
doy.end<-c(207:211)
df1<-data.frame(tag, doy.0, doy.end)
# creating df2
doy<-c(200:211)
bbb<-c(12,10,18,16,20,11,15,19,25,23,21,20)
df2<-data.frame(doy,bbb)
tag doy.0 doy.end
1 1 200 207
2 2 201 208
3 3 202 209
4 4 203 210
5 5 204 211
doy bbb
1 200 12
2 201 10
3 202 18
4 203 16
5 204 20
6 205 11
7 206 15
8 207 19
9 208 25
10 209 23
11 210 21
12 211 20
So I want an additional column in df1 with the sum of bbb. For example for tag 1, I want the bbb from doy 200 to doy 207 (it should be 121 for tag 1, 134 for tag 2, etc).
I have played around a bit with for loops but couldnt figure it out. I would really appreciate your help! Also if you can think of a better title to this question, feel free to change it. I dont even know what to call this problem, thats how annoying it is...
Does your sum always have the pattern that it should be the sum of 8 consecutive 'bbb' - values? Then this will work:
library(dplyr)
library(zoo)
df1 %>%
mutate(newvar = rollsum(df2$bbb, 8))
tag doy.0 doy.end newvar
1 1 200 207 121
2 2 201 208 134
3 3 202 209 147
4 4 203 210 150
5 5 204 211 154
df1$sum.bbb<-0
for(i in 1: nrow(df1)){
df1$sum.bbb[i]<-sum(df2[which(df2$doy[] == df1$doy.0[i]):which(df2$doy[] == df1$doy.end[i]),2])
}
> df1
tag doy.0 doy.end sum.bbb
1 1 200 207 121
2 2 201 208 134
3 3 202 209 147
4 4 203 210 150
5 5 204 211 154
With data.frame:
df1b <- do.call(rbind,
apply(df1,
1,
function(x) data.frame(tag = rep(x["tag"], x["doy.end"] - x["doy.0"] + 1),
doy = x["doy.0"]:x["doy.end"])))
merge(df1, aggregate(bbb ~ tag, merge(df1b, df2), sum))
tag doy.0 doy.end bbb
1 1 200 207 121
2 2 201 208 134
3 3 202 209 147
4 4 203 210 150
5 5 204 211 154
And usign data.table:
library(data.table)
df1 <- as.data.table(df1)
df2 <- as.data.table(df2)
df1[df2,
on = .(doy.0 <= doy, doy.end >= doy),
allow.cartesian = TRUE][,
.(doy.0 = min(doy.0), doy.end = max(doy.end), bbb = sum(bbb)),
by = .(tag)]
tag doy.0 doy.end bbb
1: 1 200 207 121
2: 2 201 208 134
3: 3 202 209 147
4: 4 203 210 150
5: 5 204 211 154
A solution using tidyverse, the loop is hidden in purrr::map :
replyr::replyr_bind_rows(
purrr::map(
replyr::replyr_split(df1,"tag"),
function(x) data.frame(tag=x$tag,
df2 %>% filter((doy>=x$doy.0)&(doy<=x$doy.end)) %>% summarise(bbb=sum(bbb)))
))
# tag bbb
#1 1 121
#2 2 134
#3 3 147
#4 4 150
#5 5 154
You can use data.table and a non-equi join to create this. If your sum always has the same pattern, the answer of @Len is very good. If your sum has different patterns, data.table is a very fast solution.
library(data.table)
# add sum of bbb to table 1 from table 2
dt1[, bbb := dt2[dt1, on=.(doy >= doy.0, doy <= doy.end), sum(bbb), by=.EACHI]$V1]
dt1
tag doy.0 doy.end bbb
1: 1 200 207 121
2: 2 201 208 134
3: 3 202 209 147
4: 4 203 210 150
5: 5 204 211 154
data:
tag<-c(1:5)
doy.0<-c(200:204)
doy.end<-c(207:211)
dt1<- data.table(tag, doy.0, doy.end) # data.table instead of data.frame
# creating dt2
doy<-c(200:211)
bbb<-c(12,10,18,16,20,11,15,19,25,23,21,20)
dt2<- data.table(doy,bbb) # data.table instead of data.frame
We could do a fuzzy join and aggregate:
library(fuzzyjoin)
library(dplyr)
fuzzy_join(df1, df2, c(doy.0 = "doy", doy.end = "doy"),
list(`<=`,`>=`)) %>%
group_by(tag,doy.0,doy.end) %>%
summarize_at("bbb",sum) %>%
ungroup
# # A tibble: 5 x 4
# tag doy.0 doy.end bbb
# <int> <int> <int> <dbl>
# 1 1 200 207 121
# 2 2 201 208 134
# 3 3 202 209 147
# 4 4 203 210 150
# 5 5 204 211 154
And a base R translation:
x <- expand.grid(tag= df1$tag,doy = df2$doy)
x <- merge(x,df1,all.x=TRUE)
x <- merge(x,df2,all.x=TRUE)
x <- subset(x, doy >= doy.0 & doy <= doy.end)
x <- aggregate(bbb ~ tag, x, sum)
merge(df1,x)
# tag doy.0 doy.end bbb
# 1 1 200 207 121
# 2 2 201 208 134
# 3 3 202 209 147
# 4 4 203 210 150
# 5 5 204 211 154
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.