简体   繁体   中英

R: how to use the aggregate()-function to sum data from one column if another column has a distinct value?

Hei, I have a problem with the aggregate-function. My data looks like this:

 transect_id    year    day month   LST precipitation   
 1  TR001   2010    191 4   30.62083    0.0000  
 2  TR001   2010    191 4   30.62083    0.0003  
 3  TR001   2010    191 5   30.62083    0.0001  
 4  TR001   2010    191 7   30.62083    0.0000  
 5  TR001   2010    191 7   30.62083    0.0000  
 6  TR001   2011    191 7   30.62083    0.0007

and I want to sum the precipitation for each quartal of each year. Which means: sum precipitation for months 1-3, months 4-6, 7-9 and 10-12 for every year (in my case 2010-2013). And add a column for it. I figured that I should use the mutate()-function from the plyr-package and then do something like

weather_gam.mutated<-mutate(weather_gam, precipitation.spring=aggregate(precipitation by = list(Category=year)))

but what to do for the months? I simply can't figure it out. I tried stuff like by = list(Category= month==1) but obviously that's not what it takes to succeed here. So basically I just try to do what SUMIFS(F1:Fx, B1:Bx = "2010", D1:Dx = "1", D1:Dx = "2", D1:Dx = "3" would do in Excel, just I hope that by setting

by = list(Category=year)

It will automatically always sum when the year is the same so I don't need to do it manually for every year. I really would appreciate any help here, also if you have a completely different idea how to solve it.

Here is a solution with dplyr and lubridate ; the idea is to use the quarter function of lubridate to find out at which quarter months belong to. Create the Quarter column, group by Quarter and create the Sum or precipitation for each group.

library(lubridate)
library(dplyr)
df$month <- month(df$month)
df %>% mutate(Quarter = quarter(month)) %>% group_by(Quarter) %>% mutate(SumPre = sum(precipitation))

Source: local data frame [6 x 8]
Groups: Quarter

  transect_id year day month      LST precipitation Quarter SumPre
1       TR001 2010 191     4 30.62083         0e+00       2  4e-04
2       TR001 2010 191     4 30.62083         3e-04       2  4e-04
3       TR001 2010 191     5 30.62083         1e-04       2  4e-04
4       TR001 2010 191     7 30.62083         0e+00       3  7e-04
5       TR001 2010 191     7 30.62083         0e+00       3  7e-04
6       TR001 2011 191     7 30.62083         7e-04       3  7e-04

and here another approach with aggregate

library(lubridate)
df$month <- month(df$month)
df$Quarter <- quarter(df$month)
aggregate(precipitation ~ Quarter, data = df, sum)
Quarter precipitation
1       2         4e-04
2       3         7e-04

data

df <- structure(list(transect_id = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = "TR001", class = "factor"), year = c(2010L, 2010L, 
2010L, 2010L, 2010L, 2011L), day = c(191L, 191L, 191L, 191L, 
191L, 191L), month = c(4L, 4L, 5L, 7L, 7L, 7L), LST = c(30.62083, 
30.62083, 30.62083, 30.62083, 30.62083, 30.62083), precipitation = c(0, 
3e-04, 1e-04, 0, 0, 7e-04)), .Names = c("transect_id", "year", 
"day", "month", "LST", "precipitation"), row.names = c("1", "2", 
"3", "4", "5", "6"), class = "data.frame")

use dplyr instead of plyr:

library(dplyr)

d.in %>%
    mutate(q=cut(month, c(0,3,6,9,12), labels=c("q1", "q2", "q3", "q4"))) %>%
    group_by(year, q) %>%
    mutate(sum.prec = sum(precipitation))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM