简体   繁体   中英

complicated sums in r - different columns in different dfs

I am trying to get sums in r. I have 2 dataframes. One consists of 3 columns (tag, doy (=day of year) at beginning, doy at end). The other consists of 2 columns (doy, bbb (=an amount per day)).

Now I want for each row of df1 the sum of bbb from doy.0 to doy.end.

# creating df1
tag<-c(1:5)
doy.0<-c(200:204)
doy.end<-c(207:211)
df1<-data.frame(tag, doy.0, doy.end)

# creating df2
doy<-c(200:211)
bbb<-c(12,10,18,16,20,11,15,19,25,23,21,20)
df2<-data.frame(doy,bbb)

  tag doy.0 doy.end
1   1   200     207
2   2   201     208
3   3   202     209
4   4   203     210
5   5   204     211

  doy bbb
1  200  12
2  201  10
3  202  18
4  203  16
5  204  20
6  205  11
7  206  15
8  207  19
9  208  25
10 209  23
11 210  21
12 211  20

So I want an additional column in df1 with the sum of bbb. For example for tag 1, I want the bbb from doy 200 to doy 207 (it should be 121 for tag 1, 134 for tag 2, etc).

I have played around a bit with for loops but couldnt figure it out. I would really appreciate your help! Also if you can think of a better title to this question, feel free to change it. I dont even know what to call this problem, thats how annoying it is...

Does your sum always have the pattern that it should be the sum of 8 consecutive 'bbb' - values? Then this will work:

library(dplyr)
library(zoo)    
df1 %>% 
    mutate(newvar = rollsum(df2$bbb, 8))

  tag doy.0 doy.end newvar
1   1   200     207    121
2   2   201     208    134
3   3   202     209    147
4   4   203     210    150
5   5   204     211    154
df1$sum.bbb<-0

for(i in 1: nrow(df1)){
df1$sum.bbb[i]<-sum(df2[which(df2$doy[] == df1$doy.0[i]):which(df2$doy[] == df1$doy.end[i]),2])
}
> df1
  tag doy.0 doy.end sum.bbb
1   1   200     207     121
2   2   201     208     134
3   3   202     209     147
4   4   203     210     150
5   5   204     211     154

With data.frame:

df1b <- do.call(rbind, 
                apply(df1, 
                      1, 
                      function(x) data.frame(tag = rep(x["tag"], x["doy.end"] - x["doy.0"] + 1), 
                                             doy = x["doy.0"]:x["doy.end"])))

merge(df1, aggregate(bbb ~ tag, merge(df1b, df2), sum))
  tag doy.0 doy.end bbb
1   1   200     207 121
2   2   201     208 134
3   3   202     209 147
4   4   203     210 150
5   5   204     211 154

And usign data.table:

library(data.table)
df1 <- as.data.table(df1)
df2 <- as.data.table(df2)

df1[df2, 
    on = .(doy.0 <= doy, doy.end >= doy), 
    allow.cartesian = TRUE][,
      .(doy.0 = min(doy.0), doy.end = max(doy.end), bbb = sum(bbb)),
      by = .(tag)]
   tag doy.0 doy.end bbb
1:   1   200     207 121
2:   2   201     208 134
3:   3   202     209 147
4:   4   203     210 150
5:   5   204     211 154

A solution using tidyverse, the loop is hidden in purrr::map :

replyr::replyr_bind_rows(
  purrr::map(
    replyr::replyr_split(df1,"tag"),
    function(x) data.frame(tag=x$tag,
                           df2 %>% filter((doy>=x$doy.0)&(doy<=x$doy.end)) %>% summarise(bbb=sum(bbb)))
))
#  tag bbb
#1   1 121
#2   2 134
#3   3 147
#4   4 150
#5   5 154

You can use data.table and a non-equi join to create this. If your sum always has the same pattern, the answer of @Len is very good. If your sum has different patterns, data.table is a very fast solution.

library(data.table)

# add sum of bbb to table 1 from table 2
dt1[, bbb := dt2[dt1, on=.(doy >= doy.0, doy <= doy.end), sum(bbb), by=.EACHI]$V1]


dt1
   tag doy.0 doy.end bbb
1:   1   200     207 121
2:   2   201     208 134
3:   3   202     209 147
4:   4   203     210 150
5:   5   204     211 154

data:

tag<-c(1:5)
doy.0<-c(200:204)
doy.end<-c(207:211)
dt1<- data.table(tag, doy.0, doy.end) # data.table instead of data.frame

# creating dt2
doy<-c(200:211)
bbb<-c(12,10,18,16,20,11,15,19,25,23,21,20)
dt2<- data.table(doy,bbb) # data.table instead of data.frame

We could do a fuzzy join and aggregate:

library(fuzzyjoin)
library(dplyr)
fuzzy_join(df1, df2, c(doy.0 = "doy", doy.end = "doy"),
           list(`<=`,`>=`)) %>%
  group_by(tag,doy.0,doy.end) %>%
  summarize_at("bbb",sum) %>%
  ungroup

# # A tibble: 5 x 4
#     tag doy.0 doy.end   bbb
#   <int> <int>   <int> <dbl>
# 1     1   200     207   121
# 2     2   201     208   134
# 3     3   202     209   147
# 4     4   203     210   150
# 5     5   204     211   154

And a base R translation:

x <- expand.grid(tag= df1$tag,doy = df2$doy)
x <- merge(x,df1,all.x=TRUE)
x <- merge(x,df2,all.x=TRUE)
x <- subset(x, doy >= doy.0 & doy <= doy.end)
x <- aggregate(bbb ~ tag, x, sum)
merge(df1,x)
#   tag doy.0 doy.end bbb
# 1   1   200     207 121
# 2   2   201     208 134
# 3   3   202     209 147
# 4   4   203     210 150
# 5   5   204     211 154

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM