R總結基於另一列排除某些行的dplyr分組數據

Question

我想根據在單獨的分組變量列中具有特定值的行以外的所有行來匯總多列的數據。 例如，在下面的 df 中，我想根據未分配給與給定行匹配的集群的行中的值來獲取 A、B、C、D 和 E 的中位數。

df = data.frame(cluster = c(1:5, 1:3, 1:2),
                    A = rnorm(10, 2),
                    B = rnorm(10, 5),
                    C = rnorm(10, 0.4),
                    D = rnorm(10, 3),
                    E = rnorm(10, 1))

df %>%
group_by(cluster) %>%
summarise_at(toupper(letters[1:5]), funs(m = fun_i_need_help_with(.)))

fun_i_need_help_with 將給出相當於：

    first row: median(df[which(df$cluster != 1), "A"])
    second row: median(df[which(df$cluster != 2), "A"])
    and so on...

我可以使用嵌套的 for 循環來完成它，但是它運行起來很慢，而且似乎不是一個很好的類似 R 的解決方案。

for(col in toupper(letters[1:5])){
    for(clust in unique(df$cluster)){
        df[which(df$cluster == clust), col] <-
           median(df[which(df$cluster != clust), col])
     }
    }

Answer 1

使用tidyverse的解決方案。

set.seed(123)

df = data.frame(cluster = c(1:5, 1:3, 1:2),
                A = rnorm(10, 2),
                B = rnorm(10, 5),
                C = rnorm(10, 0.4),
                D = rnorm(10, 3),
                E = rnorm(10, 1))

library(tidyverse)

df2 <- map_dfr(unique(df$cluster),
        ~df %>%
          filter(cluster != .x) %>%
          summarize_at(vars(-cluster), funs(median(.))) %>%
          # Add a label to show the content of this row is not from a certain cluster number
          mutate(not_cluster = .x))
df2
#          A        B          C        D         E not_cluster
# 1 2.070508 5.110683  0.1820251 3.553918 0.7920827           1
# 2 2.070508 5.400771 -0.6260044 3.688640 0.5333446           2
# 3 1.920165 5.428832 -0.2769652 3.490191 0.8543568           3
# 4 1.769823 5.400771 -0.2250393 3.426464 0.5971152           4
# 5 1.769823 5.400771 -0.3288912 3.426464 0.5971152           5

R總結基於另一列排除某些行的dplyr分組數據

問題描述

1 個解決方案

解決方案1
2 已采納 2018-11-15 01:31:23

R總結基於另一列排除某些行的dplyr分組數據

問題描述

1 個解決方案

解決方案1 2 已采納 2018-11-15 01:31:23

解決方案1
2 已采納 2018-11-15 01:31:23