简体   繁体   English

创建具有许多缺失值的每日平均值的问题

[英]Problems with creating daily mean with many missing values

I created this dataframe, which is very representive for my data, sorry for the long code.我创建了这个 dataframe,对我的数据非常有代表性,抱歉代码太长了。

library(lubridate)

datelist = seq(ymd_hms('1980-01-01 00:00:00'),ymd_hms('1980-07-01 00:00:00'), by = '60 mins')

df = data.frame(replicate(2,sample(0:130,4000,rep=TRUE)))
nbr_missing<-1000
y<-data.frame(row=sample(nrow(df),size = nbr_missing,replace = T),
              col=sample(ncol(df),size = nbr_missing,replace = T))

y<-y[!duplicated(y),]
df[as.matrix(y)]<-NA

df2 = data.frame(replicate(2,sample(0:130,369,rep=TRUE)))
nbr_missing<-500
xy<-data.frame(row=sample(nrow(df2),size = nbr_missing,replace = T),
               col=sample(ncol(df2),size = nbr_missing,replace = T))

xy<-xy[!duplicated(xy),]
df2[as.matrix(xy)]<-NA

fill1 = data.frame(matrix(NA, nrow = 4000, ncol = 2))
fill2 = data.frame(matrix(NA, nrow = 369, ncol = 2))

df_new1 = rbind(df, fill2)
df_new2 = rbind(fill1, df2)
df_new = cbind(df_new1, df_new2)

testframe = as.data.frame(cbind(datelist,df_new))
colnames(testframe) = c("Date", "ABC", "DEF", "GHI", "JKL")

I am having problems with calculating a daily mean.我在计算每日平均值时遇到问题。 I used this code several times with other data and it always worked great.我多次将此代码与其他数据一起使用,它总是很好用。 But here it seems to give me the wrong results.但在这里它似乎给了我错误的结果。 Any idea why and how to solve this?知道为什么以及如何解决这个问题吗?

library(dplyr)
testframe1 = testframe %>%
  group_by(group = gl(n()/24, 24)) %>%
  summarise_at(-1, mean, na.rm = TRUE)

For example the column JKL, it contains only NA in my hourly data for the first day, but when I create a mean, it gives me a number and not NA!例如列 JKL,它在第一天的每小时数据中只包含 NA,但是当我创建平均值时,它给了我一个数字而不是 NA!

Here is an example of what I get when using this command.这是使用此命令时得到的示例。

Hourly data每小时数据

Wrong result错误的结果

I'm not sure what went wrong with the dplyr code, you could use a by() approach with colMeans() instead.我不确定dplyr代码出了什么问题,您可以将by()方法与colMeans()一起使用。

res <- do.call(rbind, by(testframe[-1], as.Date(testframe1$Date), colMeans, na.rm=TRUE))
head(res)
#                 ABC      DEF GHI JKL
# 1980-01-01 74.25000 67.91304 NaN NaN
# 1980-01-02 52.70833 55.33333 NaN NaN
# 1980-01-03 65.37500 79.10000 NaN NaN
# 1980-01-04 48.61905 62.91667 NaN NaN
# 1980-01-05 62.34783 61.40909 NaN NaN
# 1980-01-06 80.38095 64.68182 NaN NaN

The "group" you could just cbind() . "group"你可以只是cbind()

res2 <- cbind(group=1:nrow(res), res)
head(res2)
#            group      ABC      DEF GHI JKL
# 1980-01-01     1 74.25000 67.91304 NaN NaN
# 1980-01-02     2 52.70833 55.33333 NaN NaN
# 1980-01-03     3 65.37500 79.10000 NaN NaN
# 1980-01-04     4 48.61905 62.91667 NaN NaN
# 1980-01-05     5 62.34783 61.40909 NaN NaN
# 1980-01-06     6 80.38095 64.68182 NaN NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM