[英]R time aggregate for multiple columns
我添加了一個數據框,其中包含一個時間和6個數據列,如下所示...
df <- data.frame(structure(list(Time = c(100, 100.1, 100.2, 100.2, 100.3, 100.3,100.3, 100.4, 100.4, 100.5, 100.5, 100.6, 100.6, 100.7),
x = c(4,NA, 7, NA, 3, 7, NA, 9, NA, 7, NA, 3, NA, 7),
y = c(NA, 7, NA,9, NA, 9, 7, NA, NA, NA, 9, NA, 5, NA),
a = c(7, NA, 3, 3, NA,NA, 7, NA, NA, 7, 7, NA, NA, 9),
b = c(8, NA, 4, NA, 5, 4, NA,9, NA, 1, NA, 7, NA, 2),
j = c(NA, 4, NA, 6, NA, 6, 4, NA, NA, NA, 6, NA, 2, NA),
k = c(1, NA, 5, 5, NA, NA, 1, NA, NA, 2, 2,NA, NA, 6)),
.Names = c("Time", "x", "a", "j", "y", "b", "k"),
class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -14L)))
Time x y a b j k
100 4 NA 7 8 NA 1
100.1 NA 7 NA NA 4 NA
100.2 7 NA 3 4 NA 5
100.2 NA 9 3 NA 6 5
100.3 3 NA NA 5 NA NA
100.3 7 9 NA 4 6 NA
100.3 NA 7 7 NA 4 1
100.4 9 NA NA 9 NA NA
100.4 NA NA NA NA NA NA
100.5 7 NA 7 1 NA 2
100.5 NA 9 7 NA 6 2
100.6 3 NA NA 7 NA NA
100.6 NA 5 NA NA 2 NA
100.7 7 NA 9 2 NA 6
我想使用時間列進行匯總。 必須計算x和y,a和b,j和k之間的平均時間。 輸出應該看起來像這樣。
Time xy_mean ab_mean jk_mean
100
100.1
100.2
100.3
100.4
100.5
100.6
100.7
請幫忙...
(如果問題不清楚,也請發表評論)
根據@Marijn Stevering的評論,這種方法會更有效:
df_final <- df %>%
group_by(Time) %>%
summarize(av_xy = mean(c(x,y), na.rm = TRUE),
av_ab = mean(c(a,b), na.rm = TRUE),
av_jk = mean(c(j,k), na.rm = TRUE))
df_final
## A tibble: 8 x 4
# Time av_xy av_ab av_jk
# <dbl> <dbl> <dbl> <dbl>
#1 100.0 6.00 NaN 4.0
#2 100.1 NaN 5.5 NaN
#3 100.2 5.50 7.5 4.0
#4 100.3 4.75 6.5 4.0
#5 100.4 9.00 NaN NaN
#6 100.5 4.00 7.5 4.5
#7 100.6 5.00 3.5 NaN
#8 100.7 4.50 NaN 7.5
我知道必須有一些更直接的方法,但是這是一種dplyr
方法,其中包含一些步驟:
library(dplyr)
df_xy <- df %>%
group_by(Time) %>%
summarise(av_xy = mean(c(x,y), na.rm = TRUE))
df_ab <- df %>%
group_by(Time) %>%
summarise(av_ab = mean(c(a,b), na.rm = TRUE))
df_jk <- df %>%
group_by(Time) %>%
summarise(av_jk = mean(c(j,k), na.rm = TRUE))
df_final <- df_xy %>%
left_join(df_ab) %>%
left_join(df_jk)
df_final
## A tibble: 8 x 4
# Time av_xy av_ab av_jk
# <dbl> <dbl> <dbl> <dbl>
#1 100.0 6.00 NaN 4.0
#2 100.1 NaN 5.5 NaN
#3 100.2 5.50 7.5 4.0
#4 100.3 4.75 6.5 4.0
#5 100.4 9.00 NaN NaN
#6 100.5 4.00 7.5 4.5
#7 100.6 5.00 3.5 NaN
#8 100.7 4.50 NaN 7.5
以下代碼可以滿足您的需求。 它不是很漂亮,它使用split
來按Time
將data.frame拆分為sub-df,然后使用連續的*apply
來計算結果。
如果要刪除NA
值,請在代碼開頭將NA.RM <- TRUE
設置。
fun <- function(x, y, na.rm = FALSE){
mean(c(x, y), na.rm = na.rm)
}
NA.RM <- FALSE
inx <- seq_along(names(df2))[2:4]
res <- lapply(split(df, df2$Time), function(DF)
sapply(inx, function(i) fun(DF[[i]], DF[[i + 3]], NA.RM)))
res <- do.call(rbind, res)
res <- cbind.data.frame(row.names(res), as.data.frame(res))
row.names(res) <- NULL
names(res)[1] <- names(df2)[1]
names(res)[2:4] <- sapply(inx, function(i) paste0(names(df2)[i], names(df2)[i + 1]))
names(res)[2:4] <- paste(names(res)[2:4], "mean", sep = "_")
res
# Time xy_mean ya_mean ab_mean
#1 100 6.0 NA 4.0
#2 100.1 NA 5.5 NA
#3 100.2 NA NA 4.0
#4 100.3 NA NA NA
#5 100.4 NA NA NA
#6 100.5 NA NA 4.5
#7 100.6 NA NA NA
#8 100.7 4.5 NA 7.5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.