是否有 R function 用于按一个变量（列）对数据进行分组？

Question

I measured bacterial inhibating power on viruses.我测量了细菌对病毒的抑制能力。 I have data matrix of n rows (individuals) and 4 columns (a,b,c,x).我有 n 行（个人）和 4 列（a，b，c，x）的数据矩阵。 Depending on column x I would like to define them as good or bad inhibators.根据 x 列，我想将它们定义为好的或坏的抑制剂。 However, I am not sure how to put a treshold of column x, depending on other measured columns (a,b,c).但是，我不确定如何设置 x 列的阈值，具体取决于其他测量的列（a、b、c）。 Is there any R function that could separate/group my dataframe?是否有任何 R function 可以分离/组合我的 dataframe？

Answer 1

In dplyr logic there is group_by() , it works like this:在 dplyr 逻辑中有group_by() ，它的工作原理如下：

library(dplyr)

df %>%
group_by(A) %>% # df is now grouped by column A
summarise(Mean = mean(C)) # calculates the mean of C for each group of A, summarise will delete any other columns not summarised and show only distinct rows

df %>%
group_by(A) %>%
mutate(Mean = mean(C)) # This will add the grouped mean to each row without changing the data frame

If you summarise then you are done but after group_by and mutate you have to ungroup your data frame at some point.如果你总结一下，那么你就完成了，但是在 group_by 和 mutate 之后，你必须在某个时候ungroup对数据框的分组。

Answer 2

data.table example below. data.table 示例如下。 In the data, we have 50 observations (a) across 5 groups (Group).在数据中，我们在 5 个组 (Group) 中有 50 个观察值 (a)。

Data数据

dt = data.table(
  a = runif(1:50),
  Group = sample(LETTERS[1:5], 50, replace = T)
)

Example 1示例 1

Firstly, we can calculate the Group mean of a and label it 'Good' if it is above 0.5 and 'Bad' if below.首先，我们可以计算 a 和 label 的组均值，如果高于 0.5，则为“好”，如果低于 0.5，则为“差”。 Note that this summary does not include a.请注意，此摘要不包括 a。

dt1 = dt[, .(Mean = mean(a)), keyby = Group][, Label := ifelse(Mean > 0.5, 'Good', 'Bad')]

> dt1
   Group      Mean Label
1:     A 0.2982229   Bad
2:     B 0.4102181   Bad
3:     C 0.6201973  Good
4:     D 0.4841881   Bad
5:     E 0.4443718   Bad

Example 2示例 2

Similarly to Fnguyen's answer, the following code will not summarise the data per group;与 Fnguyen 的回答类似，以下代码不会汇总每组的数据； it will merely show the Group Mean and Label next to each observation.它只会在每个观察值旁边显示组均值和 Label。

dt2 = dt[, Mean := mean(a), by = Group][, Label := ifelse(Mean > 0.5, 'Good', 'Bad')]

> head(dt2)
           a Group      Mean Label
1: 0.4253110     E 0.4443718   Bad
2: 0.4217955     A 0.2982229   Bad
3: 0.7389260     E 0.4443718   Bad
4: 0.2499628     E 0.4443718   Bad
5: 0.3807705     C 0.6201973  Good
6: 0.2841950     E 0.4443718   Bad

Example 3示例 3

Lastly, we can of course apply a conditional argument to create a new column without having previously calculated a Grouped variable.最后，我们当然可以应用条件参数来创建新列，而无需事先计算分组变量。 The following example tests a combined condition on columns a and b.以下示例测试列 a 和 b 上的组合条件。

dt3 = data.table(a = runif(100), b = runif(100))

dt3[, abGrThan0.5 := ifelse((a > 0.5 & b > 0.5), TRUE, FALSE)]

> head(dt3)
           a          b abGrThan0.5
1: 0.5132690 0.02104807       FALSE
2: 0.8466798 0.96845916        TRUE
3: 0.5776331 0.79215074        TRUE
4: 0.9740055 0.59381244        TRUE
5: 0.4311248 0.07473373       FALSE
6: 0.2547600 0.09513784       FALSE

是否有 R function 用于按一个变量（列）对数据进行分组？

问题描述

2 个解决方案

解决方案1
1 2019-09-27 09:00:37

解决方案2
0 2019-09-27 09:37:34

是否有 R function 用于按一个变量（列）对数据进行分组？

问题描述

2 个解决方案

解决方案1 1 2019-09-27 09:00:37

解决方案2 0 2019-09-27 09:37:34

解决方案1
1 2019-09-27 09:00:37

解决方案2
0 2019-09-27 09:37:34