简体   繁体   English

对按另一个因子分组的因子的每个级别进行计数

[英]Performing a count of each level of a factor grouping by another factor

I would like a dataframe output where the count 2 of 4 levels ("Yes" and "No") of a variable are recorded.我想要一个数据帧输出,其中记录了变量的 4 个级别(“是”和“否”)中的第 2 个。 I can do it by subsetting and filtering on yes or no but I feel there must be a better way to do this with dplyr我可以通过对是或否进行子集化和过滤来做到这一点,但我觉得必须有更好的方法来使用 dplyr

null.ta <- dbdata %>%
filter(MutGroup == "Null") %>%
group_by(ICD_Grouping) %>%
summarise(n()) %>%
spread(???????)

Above is what I assume I have to do to an extent but do not know how to get the spread function to work for this particular variable.以上是我假设我必须在一定程度上做但不知道如何让传播函数为这个特定变量工作的内容。 I don't mind if all 4 levels are included then I can just cut a couple columns after the fact.我不介意是否包含所有 4 个级别,然后我可以在事后剪切几列。

structure(list(ICD_Grouping = structure(c(50L, 50L, 33L, 33L, 
50L, 50L, 50L, 18L, 21L, 33L, 18L, 18L, 50L, 50L, 50L, 17L, 17L, 
17L, 17L, 17L, 17L, 50L, 50L, 50L, 50L, 18L, 18L, 16L, 50L, 50L, 
50L, 16L, 17L, 50L, 50L, 50L, 16L, 16L, 30L, 50L, 50L, 16L, 18L, 
17L, 50L, 50L, 50L, 50L, 50L, 50L, 21L, 30L, 21L, 18L, 21L, 21L, 
13L, 30L, 50L, 50L, 50L, 50L, 13L, 34L, 33L, 18L, 16L, 16L, 16L, 
16L, 18L, 10L, 34L, 37L, 34L, 34L, 18L, 33L, 33L, 18L, 18L, 37L, 
50L, 30L, 30L, 50L, 50L, 50L, 50L, 50L, 50L, 34L, 34L, 33L, 17L, 
14L, 19L, 33L, 18L, 18L, 18L, 50L, 30L, 30L, 30L, 34L, 18L, 18L, 
18L, 18L, 30L, 30L, 17L, 17L, 33L), .Label = c("", "C01-2", "C03-6", 
"C09-10", "C11", "C15", "C16", "C18-20", "C21", "C22", "C25", 
"C30-31", "C33-34", "C37-39", "C40-41", "C43", "C44", "C45", 
"C47/49", "C48", "C50", "C51", "C53", "C54-55", "C56", "C57-58", 
"C60", "C61", "C62", "C64", "C65-66/68", "C67", "C69", "C70", 
"C71", "C72", "C73", "C74-75", "C76.0", "C76.2", "C76.3", "C80", 
"C81", "C82-86", "C90.0", "C91.0", "C94.3/95", "D04", "D05", 
"D22", "D31", "D33", "D35"), class = "factor"), Immunohistochemistry = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 
2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 2L, 4L, 2L, 4L, 4L, 4L, 4L, 3L, 
3L, 4L), .Label = c("", "N/A", "No", "Yes"), class = "factor")), row.names = c(NA, 
-115L), class = "data.frame")

And I would like an output that would look like我想要一个看起来像的输出

ICD_Grouping Yes No N/A
C22           2   1   0
C45           7   3   1
C69           4   0   0

That is an example of random data, not this data.那是随机数据的一个例子,而不是这个数据。 Would just like a data frame with the counts of each factor level in Immunohistochemistry by ICD_Grouping.想要一个数据框,其中包含 ICD_Grouping 免疫组织化学中每个因子水平的计数。

If I understand correctly, we can just do that with base table :如果我理解正确,我们可以使用基table来做到这一点:

table(dbdata)

table will show results for each level (even if it's no longer present in the data), so to make the table reasonably sized, we use droplevels to remove unused levels first: table将显示每个级别的结果(即使它不再存在于数据中),因此为了使表的大小合理,我们首先使用droplevels删除未使用的级别:

table(droplevels(dbdata))

            Immunohistochemistry
ICD_Grouping N/A No Yes
      C22      0  0   1
      C33-34   0  0   2
      C37-39   1  0   0
      C43      0  2   7
      C44      1  2   8
      C45      2  0  17
      C47/49   1  0   0
      C50      0  1   4
      C64      0  0  10
      C69      7  0   2
      C70      1  0   6
      C73      0  1   1
      D22      8  0  30

A table can be converted to a data.frame with the same structure using:可以使用以下方法将table转换为具有相同结构的 data.frame:

table(droplevels(dbdata)) %>%
    as.data.frame.matrix() %>%
    tibble::rownames_to_column('ICD_Grouping')

or if you like pipes:或者如果你喜欢管道:

dbdata %>%
    droplevels() %>%
    table() %>%
    as.data.frame.matrix() %>%
    tibble::rownames_to_column('ICD_Grouping')

Both give the same data.frame as a result:两者都给出相同的data.frame结果:

   ICD_Grouping N/A No Yes
1           C22   0  0   1
2        C33-34   0  0   2
3        C37-39   1  0   0
4           C43   0  2   7
5           C44   1  2   8
6           C45   2  0  17
7        C47/49   1  0   0
8           C50   0  1   4
9           C64   0  0  10
10          C69   7  0   2
11          C70   1  0   6
12          C73   0  1   1
13          D22   8  0  30

This form is a proper data frame that can be used in any downstream processes, or joined on the ICD_Grouping variable这种形式是一个合适的数据框,可以在任何下游过程中使用,或者加入ICD_Grouping变量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM