简体   繁体   中英

R - dplyr summary over combinations of factors

If I have a simple data frame with 2 factors (a and b) with 2 levels (1 and 2) and 1 variable (x), how do I get the median values of x: median x over each level of factor a, each level of factor b, and each combination of a*b?

library(dplyr)    
df <- data.frame(a = as.factor(c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)),
   b = as.factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),
   x = c(runif(16)))

I've tried various (many) versions of:

df %>%
   group_by_(c("a", "b")) %>%
   summarize(med_rate = median(df$x))

The results should look like this for the median x of each level of factor a:

a median
1 0.58811
2 0.53167

And like this for the median x of each level of factor b:

b median
1 0.60622
2 0.46096

And like this for the median x for each combinations of a and b:

ab median
1 1 0.66745
1 2 0.34656
2 1 0.50903
2 2 0.55990

Thanks in advance for any help.

set.seed(123) ##make your example reproducible
require(data.table)
df <- data.table(a = as.factor(c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)),
             b = as.factor(c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2)),
             x = c(runif(16)))

df[, median(x), by = a]
df[, median(x), by = b]
df[, median(x), by = .(a,b)]

The following is not very elegant but creates a single data.frame that meets your expected result.

We are creating three data data.frames (for a, b and a*b) and combining them into one.

bind_rows(
  df %>% 
    group_by(a) %>% 
    rename(factor_g = a) %>% 
    summarize(med_rate = median(x)),
  df %>% 
    group_by(b) %>% 
    rename(factor = b) %>% 
    summarize(med_rate = median(x)),
  df %>% 
    # We create a column for grouping a*b
    mutate(factor = paste(a, b)) %>% 
    group_by(factor) %>% 
    summarize(med_rate = median(x))
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM