简体   繁体   中英

Using R dplyr to summarize data

I have this df where there are 3 colums: year, treatment/control group, and double variable.

year    T/C      Var
<int>   <int>   <dbl>
1992    0       15.5
1992    0       0.0
1993    0       17.5
1993    0       20.5
1992    1       40.5
1992    1       2.0
1993    1       27.0
1993    1       19.5

What can I do to make a table similar to this where the columns are the treatment/control, rows are the years, and the cells are populated by the mean of var for that group?

        0                       1
    
1992    mean(var(0, 1992))      mean(var(1, 1992))
1993    mean(var(0, 1993))      mean(var(1, 1993))

I tried group_by and summarise like this but I don't know how to make the T/C the columns.

df %>% group_by(year, T/C) %>% summarise(across(everything(), list(mean)))

You can use pivot_wider to get data in wide format and apply function mean to each group of value in the data.

library(tidyr)
result <- pivot_wider(df, names_from = T.C, values_from = Var, values_fn = mean)
result

#   year   `0`   `1`
#  <int> <dbl> <dbl>
#1  1992  7.75  21.2
#2  1993 19     23.2

In data.table you could use dcast .

library(data.table)
dcast(setDT(df), year~T.C, value.var = 'Var', fun.aggregate = mean)

data

df <- structure(list(year = c(1992L, 1992L, 1993L, 1993L, 1992L, 1992L, 
1993L, 1993L), T.C = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L), Var = c(15.5, 
0, 17.5, 20.5, 40.5, 2, 27, 19.5)), class = "data.frame", row.names = c(NA, -8L))

Packages:

library('dplyr')
library('tidyr')

Data:

table <- read.table(header=TRUE, text="
  year    T/C      Var
  1992    0       15.5
1992    0       0.0
1993    0       17.5
1993    0       20.5
1992    1       40.5
1992    1       2.0
1993    1       27.0
1993    1       19.5
")
   

Using pipelines to combine group_by and spread from tidyr is another way to solve this problem. However, it seems a bit longer than Ronak Shah response it does have the same result.

table <- table %>%
  mutate(T.C = as.character(T.C)) %>%
  group_by(year, T.C) %>%
  summarise(Mean_Var = mean(Var)) %>%
  spread(T.C, Mean_Var)

Result:

# A tibble: 2 x 3
# Groups:   year [2]
   year   `0`   `1`
  <int> <dbl> <dbl>
1  1992  7.75  21.2
2  1993 19     23.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM