[英]Count unique occurrences of factor levels and numeric values with dplyr, on data in a long format
I have data on repeated measurements of 8 patients, each with varying amount of repeated measurements on the same variables.我有 8 位患者重复测量的数据,每个患者对相同变量的重复测量量不同。 The measured variables are sex, blood pressure (sys_bp), and how many CT scans a person underwent:
测量的变量是性别、血压 (sys_bp) 以及一个人接受的 CT 扫描次数:
library(dplyr)
library(magrittr)
questiondata <- structure(list(id = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4,
4, 7, 7, 8, 8, 8, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 20,
20, 20), time = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 6L, 1L, 2L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 4L), .Label = c("T0", "T1M0", "T1M6",
"T1M12", "T2M0", "FU1"), class = "factor"), sys_bp = c(116, 125.8,
NA, NA, NA, 113.2, NA, NA, NA, NA, 146, NA, NA, NA, NA, NA, NA,
125, NA, NA, 164.5, NA, NA, NA, NA, 150.5, NA, NA, NA, NA, 158,
NA), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("female", "male"), class = "factor"),
ct_amount = c(4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 3L, 3L, 3L)), row.names = c(NA, -32L), class = c("tbl_df",
"tbl", "data.frame"))
questiondata
id time sys_bp sex ct_amount
<dbl> <fct> <dbl> <fct> <int>
1 2 T0 116 female 4
2 2 T1M0 126. female 4
3 2 T1M6 NA female 4
4 2 T1M12 NA female 4
5 3 T0 NA female 5
6 3 T1M0 113. female 5
7 3 T1M6 NA female 5
8 3 T1M12 NA female 5
9 3 T2M0 NA female 5
10 4 T0 NA male 5
11 4 T1M0 146 male 5
12 4 T1M6 NA male 5
13 4 T1M12 NA male 5
14 4 T2M0 NA male 5
15 7 T0 NA female 2
16 7 FU1 NA female 2
17 8 T0 NA female 3
18 8 T1M0 125 female 3
19 8 T2M0 NA female 3
20 13 T0 NA female 5
21 13 T1M0 164. female 5
22 13 T1M6 NA female 5
23 13 T1M12 NA female 5
24 13 T2M0 NA female 5
25 14 T0 NA male 5
26 14 T1M0 150. male 5
27 14 T1M6 NA male 5
28 14 T1M12 NA male 5
29 14 T2M0 NA male 5
30 20 T0 NA female 3
31 20 T1M0 158 female 3
32 20 T1M12 NA female 3
I am trying to count the number of persons that (1) is male/female (2) has 1/2/3/4/5 CT scans.我正在尝试计算 (1) 是男性/女性 (2) 进行 1/2/3/4/5 次 CT 扫描的人数。
So the output would be that there are (1) 6 females and 2 males, and (2) 1 person with 2 CTs, 2 persons with 3 CTs, 1 person with 4 CTs and 4 persons with 5 CTs.因此输出将是 (1) 6 名女性和 2 名男性,以及 (2) 1 个人有 2 个 CT,2 个人有 3 个 CT,1 个人有 4 个 CT,4 个人有 5 个 CT。
I've tried many combinations of group_by
and summarise
and count
, but can't seem to get it right.我试过的许多组合
group_by
和summarise
,并count
,但似乎无法得到它的权利。 Any help?有什么帮助吗?
You can first keep only the unique rows for each id
.您可以首先只保留每个
id
的唯一行。 Then use count
to get the output.然后使用
count
得到输出。
library(dplyr)
unique_data <- questiondata %>% distinct(id, .keep_all = TRUE)
unique_data %>% count(sex)
# A tibble: 2 x 2
# sex n
# <fct> <int>
#1 female 6
#2 male 2
unique_data %>% count(ct_amount)
# A tibble: 4 x 2
# ct_amount n
# <int> <int>
#1 2 1
#2 3 2
#3 4 1
#4 5 4
We could use duplicated
with filter
我们可以使用带有
filter
duplicated
library(dplyr)
questiondata %>%
filter(!duplicated(id)) %>%
count(ct_amount)
# A tibble: 4 x 2
ct_amount n
<int> <int>
1 2 1
2 3 2
3 4 1
4 5 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.