简体   繁体   English


[英]R: how to use the aggregate()-function to sum data from one column if another column has a distinct value?

Hei, I have a problem with the aggregate-function. 嘿,我对聚合功能有疑问。 My data looks like this: 我的数据如下所示:

 transect_id    year    day month   LST precipitation   
 1  TR001   2010    191 4   30.62083    0.0000  
 2  TR001   2010    191 4   30.62083    0.0003  
 3  TR001   2010    191 5   30.62083    0.0001  
 4  TR001   2010    191 7   30.62083    0.0000  
 5  TR001   2010    191 7   30.62083    0.0000  
 6  TR001   2011    191 7   30.62083    0.0007

and I want to sum the precipitation for each quartal of each year. 我想对每年每个夸脱的降水量求和。 Which means: sum precipitation for months 1-3, months 4-6, 7-9 and 10-12 for every year (in my case 2010-2013). 这意味着:每年总计1-3个月,4-6个月,7-9个月和10-12个月的降水量(以我为例,2010-2013年)。 And add a column for it. 并为其添加一列。 I figured that I should use the mutate()-function from the plyr-package and then do something like 我认为我应该使用plyr包中的mutate()函数,然后执行类似的操作

weather_gam.mutated<-mutate(weather_gam, precipitation.spring=aggregate(precipitation by = list(Category=year)))

but what to do for the months? 但是几个月来该怎么办? I simply can't figure it out. 我根本无法弄清楚。 I tried stuff like by = list(Category= month==1) but obviously that's not what it takes to succeed here. 我尝试了by = list(Category= month==1)东西by = list(Category= month==1)但是显然这不是在这里成功的必要条件。 So basically I just try to do what SUMIFS(F1:Fx, B1:Bx = "2010", D1:Dx = "1", D1:Dx = "2", D1:Dx = "3" would do in Excel, just I hope that by setting 因此,基本上我只是尝试做SUMIFS(F1:Fx, B1:Bx = "2010", D1:Dx = "1", D1:Dx = "2", D1:Dx = "3"我只是希望通过设置

by = list(Category=year)

It will automatically always sum when the year is the same so I don't need to do it manually for every year. 当年份相同时,它将自动总和,因此我不需要每年手动进行。 I really would appreciate any help here, also if you have a completely different idea how to solve it. 如果您对解决方法有完全不同的想法,我将非常感谢您的帮助。

Here is a solution with dplyr and lubridate ; 这是dplyrlubridate的解决方案; the idea is to use the quarter function of lubridate to find out at which quarter months belong to. 这个想法是使用lubridatequarter函数来找出属于哪个四分之一月。 Create the Quarter column, group by Quarter and create the Sum or precipitation for each group. 创建“ Quarter列,按“季度”分组,然后为每个组创建“总和”或“ precipitation ”。

df$month <- month(df$month)
df %>% mutate(Quarter = quarter(month)) %>% group_by(Quarter) %>% mutate(SumPre = sum(precipitation))

Source: local data frame [6 x 8]
Groups: Quarter

  transect_id year day month      LST precipitation Quarter SumPre
1       TR001 2010 191     4 30.62083         0e+00       2  4e-04
2       TR001 2010 191     4 30.62083         3e-04       2  4e-04
3       TR001 2010 191     5 30.62083         1e-04       2  4e-04
4       TR001 2010 191     7 30.62083         0e+00       3  7e-04
5       TR001 2010 191     7 30.62083         0e+00       3  7e-04
6       TR001 2011 191     7 30.62083         7e-04       3  7e-04

and here another approach with aggregate 这里是aggregate另一种方法

df$month <- month(df$month)
df$Quarter <- quarter(df$month)
aggregate(precipitation ~ Quarter, data = df, sum)
Quarter precipitation
1       2         4e-04
2       3         7e-04

data 数据

df <- structure(list(transect_id = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = "TR001", class = "factor"), year = c(2010L, 2010L, 
2010L, 2010L, 2010L, 2011L), day = c(191L, 191L, 191L, 191L, 
191L, 191L), month = c(4L, 4L, 5L, 7L, 7L, 7L), LST = c(30.62083, 
30.62083, 30.62083, 30.62083, 30.62083, 30.62083), precipitation = c(0, 
3e-04, 1e-04, 0, 0, 7e-04)), .Names = c("transect_id", "year", 
"day", "month", "LST", "precipitation"), row.names = c("1", "2", 
"3", "4", "5", "6"), class = "data.frame")

use dplyr instead of plyr: 使用dplyr而不是plyr:


d.in %>%
    mutate(q=cut(month, c(0,3,6,9,12), labels=c("q1", "q2", "q3", "q4"))) %>%
    group_by(year, q) %>%
    mutate(sum.prec = sum(precipitation))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM