简体   繁体   中英

Divide between groups of rows using group_by

My data is contained in a data.frame:

SYMBOL     variable    value   Sample  IDs  Group
TLR8 MMRF_2613_1_BM 3.186233 Baseline 2613 LessUp
TLR8 MMRF_2613_1_BM 5.471014 Baseline 2613 LessUp
TLR8 MMRF_2613_1_BM 2.917965 Baseline 2613 MostUp
TLR8 MMRF_2613_1_BM 2.147028 Baseline 2613 MostUp
TLR4 MMRF_2613_1_BM 7.497424 Baseline 2613 LessUp
TLR4 MMRF_2613_1_BM 4.16523 Baseline 2613 LessUp       
TLR4 MMRF_2613_1_BM 7.136523 Baseline 2613 MostUp
TLR4 MMRF_2613_1_BM 7.96523 Baseline 2613 MostUp

For each SYMBOL , I would like to divide the sum of value for the rows where Group is "MostUp" by the sum of value for "LessUp" rows.

I believe I could use the group_by function, but I am not sure how to apply it correctly.

Here is an example of my expected output.

SYMBOL     variable value  Sample  IDs  Group
TLR8 MMRF_2613_1_BM 0.58 Baseline 2613 MostUp_divided_by_LessUp
TLR4 MMRF_2613_1_BM 1.29 Baseline 2613 MostUp_divided_by_LessUp

In addition to calculating the ratios, how would I perform a T-test between the groups?

We could first calculate the sum of each Group for each Symbol and then divide within each other based on value of 'MostUp' and 'LessUp' .

library(dplyr)

df %>%
  group_by(SYMBOL, variable, Sample, IDs, Group) %>%
  summarise(value = sum(value)) %>%
  summarise(value = value[Group == 'MostUp']/value[Group == 'LessUp'])

#  SYMBOL variable       Sample     IDs value
#  <fct>  <fct>          <fct>    <int> <dbl>
#1 TLR4   MMRF_2613_1_BM Baseline  2613 1.29 
#2 TLR8   MMRF_2613_1_BM Baseline  2613 0.585

To calculate t.test between groups we can do:

df1 <- df %>%
         group_by(SYMBOL, variable, Sample, IDs) %>%
         summarise(value = list(t.test(value[Group == 'MostUp'], 
                                       value[Group == 'LessUp']))) 

df1
# A tibble: 2 x 5
# Groups:   SYMBOL, variable, Sample [2]
#  SYMBOL variable       Sample     IDs value  
#  <fct>  <fct>          <fct>    <int> <list> 
#1  TLR4   MMRF_2613_1_BM Baseline  2613 <htest>
#2  TLR8   MMRF_2613_1_BM Baseline  2613 <htest>

data

df <- structure(list(SYMBOL = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L), .Label = c("TLR4", "TLR8"), class = "factor"), variable = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "MMRF_2613_1_BM", class = "factor"), 
value = c(3.186233, 5.471014, 2.917965, 2.147028, 7.497424, 
4.16523, 7.136523, 7.96523), Sample = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "Baseline", class = "factor"), 
IDs = c(2613L, 2613L, 2613L, 2613L, 2613L, 2613L, 2613L, 
2613L), Group = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L
), .Label = c("LessUp", "MostUp"), class = "factor")), 
class = "data.frame", row.names = c(NA, -8L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM