[英]Performing a count of each level of a factor grouping by another factor
我想要一个数据帧输出,其中记录了变量的 4 个级别(“是”和“否”)中的第 2 个。 我可以通过对是或否进行子集化和过滤来做到这一点,但我觉得必须有更好的方法来使用 dplyr
null.ta <- dbdata %>%
filter(MutGroup == "Null") %>%
group_by(ICD_Grouping) %>%
summarise(n()) %>%
spread(???????)
以上是我假设我必须在一定程度上做但不知道如何让传播函数为这个特定变量工作的内容。 我不介意是否包含所有 4 个级别,然后我可以在事后剪切几列。
structure(list(ICD_Grouping = structure(c(50L, 50L, 33L, 33L,
50L, 50L, 50L, 18L, 21L, 33L, 18L, 18L, 50L, 50L, 50L, 17L, 17L,
17L, 17L, 17L, 17L, 50L, 50L, 50L, 50L, 18L, 18L, 16L, 50L, 50L,
50L, 16L, 17L, 50L, 50L, 50L, 16L, 16L, 30L, 50L, 50L, 16L, 18L,
17L, 50L, 50L, 50L, 50L, 50L, 50L, 21L, 30L, 21L, 18L, 21L, 21L,
13L, 30L, 50L, 50L, 50L, 50L, 13L, 34L, 33L, 18L, 16L, 16L, 16L,
16L, 18L, 10L, 34L, 37L, 34L, 34L, 18L, 33L, 33L, 18L, 18L, 37L,
50L, 30L, 30L, 50L, 50L, 50L, 50L, 50L, 50L, 34L, 34L, 33L, 17L,
14L, 19L, 33L, 18L, 18L, 18L, 50L, 30L, 30L, 30L, 34L, 18L, 18L,
18L, 18L, 30L, 30L, 17L, 17L, 33L), .Label = c("", "C01-2", "C03-6",
"C09-10", "C11", "C15", "C16", "C18-20", "C21", "C22", "C25",
"C30-31", "C33-34", "C37-39", "C40-41", "C43", "C44", "C45",
"C47/49", "C48", "C50", "C51", "C53", "C54-55", "C56", "C57-58",
"C60", "C61", "C62", "C64", "C65-66/68", "C67", "C69", "C70",
"C71", "C72", "C73", "C74-75", "C76.0", "C76.2", "C76.3", "C80",
"C81", "C82-86", "C90.0", "C91.0", "C94.3/95", "D04", "D05",
"D22", "D31", "D33", "D35"), class = "factor"), Immunohistochemistry = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L,
2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 2L, 4L, 2L, 4L, 4L, 4L, 4L, 3L,
3L, 4L), .Label = c("", "N/A", "No", "Yes"), class = "factor")), row.names = c(NA,
-115L), class = "data.frame")
我想要一个看起来像的输出
ICD_Grouping Yes No N/A
C22 2 1 0
C45 7 3 1
C69 4 0 0
那是随机数据的一个例子,而不是这个数据。 想要一个数据框,其中包含 ICD_Grouping 免疫组织化学中每个因子水平的计数。
如果我理解正确,我们可以使用基table
来做到这一点:
table(dbdata)
table
将显示每个级别的结果(即使它不再存在于数据中),因此为了使表的大小合理,我们首先使用droplevels
删除未使用的级别:
table(droplevels(dbdata))
Immunohistochemistry
ICD_Grouping N/A No Yes
C22 0 0 1
C33-34 0 0 2
C37-39 1 0 0
C43 0 2 7
C44 1 2 8
C45 2 0 17
C47/49 1 0 0
C50 0 1 4
C64 0 0 10
C69 7 0 2
C70 1 0 6
C73 0 1 1
D22 8 0 30
可以使用以下方法将table
转换为具有相同结构的 data.frame:
table(droplevels(dbdata)) %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column('ICD_Grouping')
或者如果你喜欢管道:
dbdata %>%
droplevels() %>%
table() %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column('ICD_Grouping')
两者都给出相同的data.frame
结果:
ICD_Grouping N/A No Yes
1 C22 0 0 1
2 C33-34 0 0 2
3 C37-39 1 0 0
4 C43 0 2 7
5 C44 1 2 8
6 C45 2 0 17
7 C47/49 1 0 0
8 C50 0 1 4
9 C64 0 0 10
10 C69 7 0 2
11 C70 1 0 6
12 C73 0 1 1
13 D22 8 0 30
这种形式是一个合适的数据框,可以在任何下游过程中使用,或者加入ICD_Grouping
变量
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.