![](/img/trans.png)
[英]How to use dplyr to convert variables from numeric to factor with unique levels
[英]Count unique occurrences of factor levels and numeric values with dplyr, on data in a long format
我有 8 位患者重復測量的數據,每個患者對相同變量的重復測量量不同。 測量的變量是性別、血壓 (sys_bp) 以及一個人接受的 CT 掃描次數:
library(dplyr)
library(magrittr)
questiondata <- structure(list(id = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4,
4, 7, 7, 8, 8, 8, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 20,
20, 20), time = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 6L, 1L, 2L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 4L), .Label = c("T0", "T1M0", "T1M6",
"T1M12", "T2M0", "FU1"), class = "factor"), sys_bp = c(116, 125.8,
NA, NA, NA, 113.2, NA, NA, NA, NA, 146, NA, NA, NA, NA, NA, NA,
125, NA, NA, 164.5, NA, NA, NA, NA, 150.5, NA, NA, NA, NA, 158,
NA), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("female", "male"), class = "factor"),
ct_amount = c(4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 2L, 2L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 3L, 3L, 3L)), row.names = c(NA, -32L), class = c("tbl_df",
"tbl", "data.frame"))
questiondata
id time sys_bp sex ct_amount
<dbl> <fct> <dbl> <fct> <int>
1 2 T0 116 female 4
2 2 T1M0 126. female 4
3 2 T1M6 NA female 4
4 2 T1M12 NA female 4
5 3 T0 NA female 5
6 3 T1M0 113. female 5
7 3 T1M6 NA female 5
8 3 T1M12 NA female 5
9 3 T2M0 NA female 5
10 4 T0 NA male 5
11 4 T1M0 146 male 5
12 4 T1M6 NA male 5
13 4 T1M12 NA male 5
14 4 T2M0 NA male 5
15 7 T0 NA female 2
16 7 FU1 NA female 2
17 8 T0 NA female 3
18 8 T1M0 125 female 3
19 8 T2M0 NA female 3
20 13 T0 NA female 5
21 13 T1M0 164. female 5
22 13 T1M6 NA female 5
23 13 T1M12 NA female 5
24 13 T2M0 NA female 5
25 14 T0 NA male 5
26 14 T1M0 150. male 5
27 14 T1M6 NA male 5
28 14 T1M12 NA male 5
29 14 T2M0 NA male 5
30 20 T0 NA female 3
31 20 T1M0 158 female 3
32 20 T1M12 NA female 3
我正在嘗試計算 (1) 是男性/女性 (2) 進行 1/2/3/4/5 次 CT 掃描的人數。
因此輸出將是 (1) 6 名女性和 2 名男性,以及 (2) 1 個人有 2 個 CT,2 個人有 3 個 CT,1 個人有 4 個 CT,4 個人有 5 個 CT。
我試過的許多組合group_by
和summarise
,並count
,但似乎無法得到它的權利。 有什么幫助嗎?
您可以首先只保留每個id
的唯一行。 然后使用count
得到輸出。
library(dplyr)
unique_data <- questiondata %>% distinct(id, .keep_all = TRUE)
unique_data %>% count(sex)
# A tibble: 2 x 2
# sex n
# <fct> <int>
#1 female 6
#2 male 2
unique_data %>% count(ct_amount)
# A tibble: 4 x 2
# ct_amount n
# <int> <int>
#1 2 1
#2 3 2
#3 4 1
#4 5 4
我們可以使用帶有filter
duplicated
library(dplyr)
questiondata %>%
filter(!duplicated(id)) %>%
count(ct_amount)
# A tibble: 4 x 2
ct_amount n
<int> <int>
1 2 1
2 3 2
3 4 1
4 5 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.