I would like a dataframe output where the count 2 of 4 levels ("Yes" and "No") of a variable are recorded. I can do it by subsetting and filtering on yes or no but I feel there must be a better way to do this with dplyr
null.ta <- dbdata %>%
filter(MutGroup == "Null") %>%
group_by(ICD_Grouping) %>%
summarise(n()) %>%
spread(???????)
Above is what I assume I have to do to an extent but do not know how to get the spread function to work for this particular variable. I don't mind if all 4 levels are included then I can just cut a couple columns after the fact.
structure(list(ICD_Grouping = structure(c(50L, 50L, 33L, 33L,
50L, 50L, 50L, 18L, 21L, 33L, 18L, 18L, 50L, 50L, 50L, 17L, 17L,
17L, 17L, 17L, 17L, 50L, 50L, 50L, 50L, 18L, 18L, 16L, 50L, 50L,
50L, 16L, 17L, 50L, 50L, 50L, 16L, 16L, 30L, 50L, 50L, 16L, 18L,
17L, 50L, 50L, 50L, 50L, 50L, 50L, 21L, 30L, 21L, 18L, 21L, 21L,
13L, 30L, 50L, 50L, 50L, 50L, 13L, 34L, 33L, 18L, 16L, 16L, 16L,
16L, 18L, 10L, 34L, 37L, 34L, 34L, 18L, 33L, 33L, 18L, 18L, 37L,
50L, 30L, 30L, 50L, 50L, 50L, 50L, 50L, 50L, 34L, 34L, 33L, 17L,
14L, 19L, 33L, 18L, 18L, 18L, 50L, 30L, 30L, 30L, 34L, 18L, 18L,
18L, 18L, 30L, 30L, 17L, 17L, 33L), .Label = c("", "C01-2", "C03-6",
"C09-10", "C11", "C15", "C16", "C18-20", "C21", "C22", "C25",
"C30-31", "C33-34", "C37-39", "C40-41", "C43", "C44", "C45",
"C47/49", "C48", "C50", "C51", "C53", "C54-55", "C56", "C57-58",
"C60", "C61", "C62", "C64", "C65-66/68", "C67", "C69", "C70",
"C71", "C72", "C73", "C74-75", "C76.0", "C76.2", "C76.3", "C80",
"C81", "C82-86", "C90.0", "C91.0", "C94.3/95", "D04", "D05",
"D22", "D31", "D33", "D35"), class = "factor"), Immunohistochemistry = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 3L, 3L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 2L, 2L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L,
2L, 4L, 4L, 2L, 4L, 4L, 4L, 4L, 2L, 4L, 2L, 4L, 4L, 4L, 4L, 3L,
3L, 4L), .Label = c("", "N/A", "No", "Yes"), class = "factor")), row.names = c(NA,
-115L), class = "data.frame")
And I would like an output that would look like
ICD_Grouping Yes No N/A
C22 2 1 0
C45 7 3 1
C69 4 0 0
That is an example of random data, not this data. Would just like a data frame with the counts of each factor level in Immunohistochemistry by ICD_Grouping.
If I understand correctly, we can just do that with base table
:
table(dbdata)
table
will show results for each level (even if it's no longer present in the data), so to make the table reasonably sized, we use droplevels
to remove unused levels first:
table(droplevels(dbdata))
Immunohistochemistry
ICD_Grouping N/A No Yes
C22 0 0 1
C33-34 0 0 2
C37-39 1 0 0
C43 0 2 7
C44 1 2 8
C45 2 0 17
C47/49 1 0 0
C50 0 1 4
C64 0 0 10
C69 7 0 2
C70 1 0 6
C73 0 1 1
D22 8 0 30
A table
can be converted to a data.frame with the same structure using:
table(droplevels(dbdata)) %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column('ICD_Grouping')
or if you like pipes:
dbdata %>%
droplevels() %>%
table() %>%
as.data.frame.matrix() %>%
tibble::rownames_to_column('ICD_Grouping')
Both give the same data.frame
as a result:
ICD_Grouping N/A No Yes
1 C22 0 0 1
2 C33-34 0 0 2
3 C37-39 1 0 0
4 C43 0 2 7
5 C44 1 2 8
6 C45 2 0 17
7 C47/49 1 0 0
8 C50 0 1 4
9 C64 0 0 10
10 C69 7 0 2
11 C70 1 0 6
12 C73 0 1 1
13 D22 8 0 30
This form is a proper data frame that can be used in any downstream processes, or joined on the ICD_Grouping
variable
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.