简体   繁体   English

如何根据可变行数创建具有移动平均值的新列?

[英]How to create a new column with moving averages based on a variable number of rows?

I am trying to write a code that creates a new column with moving averages based on 'year' where the number of rows for each year is variable where each year only has one unique value being repeated for each row of that year.我正在尝试编写一个代码,该代码基于“年”创建一个具有移动平均值的新列,其中每年的行数是可变的,其中每年只有一个唯一值在该年的每一行重复。 I want to calculate moving averages based these unique values independent of the number of rows per year.我想根据这些唯一值计算移动平均值,而与每年的行数无关。

Just a FYI;仅供参考; I'm very new to R and programming so if I missed something for you to understand my problem please let me know.我对 R 和编程非常陌生,所以如果我错过了一些让你理解我的问题的东西,请告诉我。

For example, the type of data I'm working with looks like this:例如,我正在使用的数据类型如下所示:

df <- data.frame(year = c(1702, 1702, 1702, 1702,   1702,   1703,   1703,   1703,   1704,   1704,   1705,   1705,   1705,   1705,   1706,   1706,   1707,   1707,   1707, 1708, 1708,   1708,   1708,   1708,   1709,   1709,   1709,   1709,   1709), avgtemp = c(5.3, 5.3,    5.3,    5.3,    5.3,    3.9,    3.9,    3.9,    6.12,   6.12,   4.16,   4.16,   4.16,   4.16,   5.65,   5.65,   3.11,   3.11,   3.11, 5.17, 5.17,   5.17,   5.17,   5.17,   4.75,   4.75,   4.75,   4.75,   4.75))

I found this post, Moving Average by Unique Date with multiple observations per date , and tried the solution offered there by Mark Peterson but it doesnt work for me.我找到了这篇文章, Moving Average by Unique Date,每个日期有多个观察值,并尝试了 Mark Peterson 提供的解决方案,但它对我不起作用。

I've tried the following code.我试过下面的代码。

rolledavg <-
  df %>%
  group_by(year) %>%
  summarise(rollavg = mean(avgtemp)) %>%
  ungroup() %>%
  arrange(year) %>%
  mutate( ma3temp = rollapply(rollavg
                              , 3
                              , mean
                              , align= "right"
                              , partial=T
                              , fill = NA))

I get the following error: "Error in order(year): argument 1 is not a vector".我收到以下错误:“顺序错误(年份):参数 1 不是向量”。

The expected output should be something like this:预期的 output 应该是这样的:

expected output df预计 output df

I would appreciate any help I can get.我会很感激我能得到的任何帮助。 Don't mind working with other packages/solutions than the one offered above.不要介意使用上面提供的软件包/解决方案以外的其他软件包/解决方案。

Something like this with sapply() ?sapply()这样的东西?

dat$ra <- sapply(1:nrow(dat), function(n) mean(dat$avgtemp[1:n]))
#    year avgtemp       ra
# 1  1702    5.30 5.300000
# 2  1702    5.30 5.300000
# 3  1702    5.30 5.300000
# 4  1702    5.30 5.300000
# 5  1702    5.30 5.300000
# 6  1703    3.90 5.066667
# 7  1703    3.90 4.900000
# 8  1703    3.90 4.775000
# 9  1704    6.12 4.924444
# 10 1704    6.12 5.044000
# 11 1705    4.16 4.963636
# 12 1705    4.16 4.896667
# 13 1705    4.16 4.840000
# 14 1705    4.16 4.791429
# 15 1706    5.65 4.848667
# 16 1706    5.65 4.898750
# 17 1707    3.11 4.793529
# 18 1707    3.11 4.700000
# 19 1707    3.11 4.616316

Note: If you want just two digits, use round(mean(.), 2) .注意:如果您只需要两位数,请使用round(mean(.), 2)

Update更新

Following the update of your question, you may calculate the moving average with filter() 1 from a unique version of your data frame and merge the result with the original data frame.更新您的问题后,您可以使用filter() 1从数据框的唯一版本计算移动平均值,并将结果与原始数据框merge

dat <- merge(dat, transform(unique(dat), ra=filter(avgtemp, rep(1/3, 3), sides=1)))
#    year avgtemp       ra
# 1  1702    5.30       NA
# 2  1702    5.30       NA
# 3  1702    5.30       NA
# 4  1702    5.30       NA
# 5  1702    5.30       NA
# 6  1703    3.90       NA
# 7  1703    3.90       NA
# 8  1703    3.90       NA
# 9  1704    6.12 5.106667
# 10 1704    6.12 5.106667
# 11 1705    4.16 4.726667
# 12 1705    4.16 4.726667
# 13 1705    4.16 4.726667
# 14 1705    4.16 4.726667
# 15 1706    5.65 5.310000
# 16 1706    5.65 5.310000
# 17 1707    3.11 4.306667
# 18 1707    3.11 4.306667
# 19 1707    3.11 4.306667

This is also possible with the zoo::rollmean() function. zoo::rollmean() function 也可以做到这一点。

dat <- merge(dat, transform(unique(dat), ra=c(rep(NA, 3 - 1), zoo::rollmean(avgtemp, 3))))

Data数据

dat <- structure(list(year = c(1702, 1702, 1702, 1702, 1702, 1703, 1703, 
1703, 1704, 1704, 1705, 1705, 1705, 1705, 1706, 1706, 1707, 1707, 
1707), avgtemp = c(5.3, 5.3, 5.3, 5.3, 5.3, 3.9, 3.9, 3.9, 6.12, 
6.12, 4.16, 4.16, 4.16, 4.16, 5.65, 5.65, 3.11, 3.11, 3.11)), row.names = c(NA, 
-19L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM