簡體   English   中英

R 計算總共有多少個值用於計算平均值 function

[英]R calculate how many values used to calculate mean in aggregate function

我有一個 dataframe 的每日觀察數據,可追溯到 1963 年至 2022 年。 我想計算每個月的觀察平均值。 但是,有些月份沒有每天的數據,有些月份只有一個數據點。 這會扭曲一些數據點。 如何計算用於計算給定月份平均值的觀測值。

數據幀頭

   structure(list(prcp_amt = c(0, 1.8, 6.4, 5.1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 4.3, 0, 0, 0, 0, 4.6, 0, 0, 0, 0, 0, 0, 0, 0.3, 
4.8, 0, 0, 4.1, 0, 0, 0, 0.3, 3.6, 6.6, 0, 0, 0, 0, 0, 0, 0.8, 
0, 0, 0, 0, 0), ob_date = structure(c(-220838400, -220752000, 
-220665600, -220579200, -220492800, -220406400, -220320000, -220233600, 
-220147200, -220060800, -219974400, -219888000, -219801600, -219715200, 
-219628800, -219542400, -219456000, -219369600, -219283200, -219196800, 
-219110400, -219024000, -218937600, -218851200, -218764800, -218678400, 
-218592000, -218505600, -218419200, -218332800, -218246400, -218160000, 
-218073600, -217987200, -217900800, -217814400, -217728000, -217641600, 
-217555200, -217468800, -217382400, -217296000, -217209600, -217123200, 
-217036800, -216950400, -216864000, -216777600, -216691200, -216604800
), class = c("POSIXct", "POSIXt"), tzone = "GMT")), row.names = c(NA, 
50L), class = "data.frame")

現有代碼

# historic monthly rainfall
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=mean)
rainHistMean$day <- 01

rainHistMean <-
  rainHistMean %>%
    mutate(rainHistMean, Date=paste(year, month, day, sep='-'))

rainHistMean[['Date']] <- as.POSIXct(rainHistMean[['Date']],
                                     format='%Y-%m-%d',
                                     tz='GMT'
                                     )

更新代碼

rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=function(x) c(mean(x), length(x)))
names(rainHistMean) <- c('month', 'year', 'prcp_amt', 'n')

如何讓矩陣有 4 列而不是 3 列?

在此處輸入圖像描述

解決方案

rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=function(x) c(mean(x), length(x)))
rainHistMean <- data.frame(rainHistMean[1:2], rainHistMean[[3]])
names(rainHistMean) <- c('month', 'year', 'prcp_amt', 'n')

可能有更優雅的解決方案,但您可以使用dplyr按月和年分組,然后在summarize中獲取計數和平均值:

df %>% 
  group_by(month(ob_date), year(ob_date)) %>% 
  summarize(mean_prcp = mean(prcp_amt),
            count = n())

Output:

# # Groups:   month(ob_date) [2]
# `month(ob_date)` `year(ob_date)` mean_prcp count
# <dbl>           <dbl>     <dbl> <int>
# 1                1            1963      0.91    30
# 2                2            1963      0.77    20

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM