[英]R calculate how many values used to calculate mean in aggregate function
我有一個 dataframe 的每日觀察數據,可追溯到 1963 年至 2022 年。 我想計算每個月的觀察平均值。 但是,有些月份沒有每天的數據,有些月份只有一個數據點。 這會扭曲一些數據點。 如何計算用於計算給定月份平均值的觀測值。
structure(list(prcp_amt = c(0, 1.8, 6.4, 5.1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 4.3, 0, 0, 0, 0, 4.6, 0, 0, 0, 0, 0, 0, 0, 0.3,
4.8, 0, 0, 4.1, 0, 0, 0, 0.3, 3.6, 6.6, 0, 0, 0, 0, 0, 0, 0.8,
0, 0, 0, 0, 0), ob_date = structure(c(-220838400, -220752000,
-220665600, -220579200, -220492800, -220406400, -220320000, -220233600,
-220147200, -220060800, -219974400, -219888000, -219801600, -219715200,
-219628800, -219542400, -219456000, -219369600, -219283200, -219196800,
-219110400, -219024000, -218937600, -218851200, -218764800, -218678400,
-218592000, -218505600, -218419200, -218332800, -218246400, -218160000,
-218073600, -217987200, -217900800, -217814400, -217728000, -217641600,
-217555200, -217468800, -217382400, -217296000, -217209600, -217123200,
-217036800, -216950400, -216864000, -216777600, -216691200, -216604800
), class = c("POSIXct", "POSIXt"), tzone = "GMT")), row.names = c(NA,
50L), class = "data.frame")
# historic monthly rainfall
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=mean)
rainHistMean$day <- 01
rainHistMean <-
rainHistMean %>%
mutate(rainHistMean, Date=paste(year, month, day, sep='-'))
rainHistMean[['Date']] <- as.POSIXct(rainHistMean[['Date']],
format='%Y-%m-%d',
tz='GMT'
)
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=function(x) c(mean(x), length(x)))
names(rainHistMean) <- c('month', 'year', 'prcp_amt', 'n')
如何讓矩陣有 4 列而不是 3 列?
rainHist$month <- as.numeric(format(rainHist$ob_date, '%m'))
rainHist$year <- as.numeric(format(rainHist$ob_date, '%Y'))
rainHistMean <- aggregate(prcp_amt ~ month + year, rainHist, FUN=function(x) c(mean(x), length(x)))
rainHistMean <- data.frame(rainHistMean[1:2], rainHistMean[[3]])
names(rainHistMean) <- c('month', 'year', 'prcp_amt', 'n')
可能有更優雅的解決方案,但您可以使用dplyr
按月和年分組,然后在summarize
中獲取計數和平均值:
df %>%
group_by(month(ob_date), year(ob_date)) %>%
summarize(mean_prcp = mean(prcp_amt),
count = n())
Output:
# # Groups: month(ob_date) [2]
# `month(ob_date)` `year(ob_date)` mean_prcp count
# <dbl> <dbl> <dbl> <int>
# 1 1 1963 0.91 30
# 2 2 1963 0.77 20
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.