[英]Factor ordering with forcats
我有要裝箱並轉換為因子的數據。 不過,我在理解我的因子變量發生了什么時遇到了一些麻煩。 我正在嘗試根據連續變量對因子變量進行排序。
我已經閱讀了它,但我看到的所有示例都只包含每個因子級別的一個實例,而我的示例包含某些因子級別的多個實例。
這是示例數據:
df <- structure(list(Group = c("Grp1", "Grp1", "Grp1", "Grp1", "Grp1",
"Grp1", "Grp1", "Grp2", "Grp2", "Grp2", "Grp2", "Grp2"), Ind = c("A",
"B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E"), Value = c(0.155903329567489,
0.0582906870761889, 0.180600101489814, 0.26357423622443, 0.0637832368895064,
0.213803701918138, 0.0640447068344333, 0.333501508730367, 0.160676738803951,
0.279178514111584, 0.145767023637501, 0.0808762147165962)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
根據這些數據,我創建了一個因子並檢查了每個元素的順序。
library(dplyr)
library(forcats)
df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.))) %>%
mutate(Order = labels(Bin)) %>%
ungroup()
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <chr>
1 Grp1 A 0.156 (0.144,0.161] 1
2 Grp1 B 0.0583 [0.0583,0.0754] 2
3 Grp1 C 0.181 (0.178,0.195] 3
4 Grp1 D 0.264 (0.246,0.264] 4
5 Grp1 E 0.0638 [0.0583,0.0754] 5
6 Grp1 F 0.214 (0.212,0.229] 6
7 Grp1 G 0.0640 [0.0583,0.0754] 7
8 Grp2 A 0.334 (0.312,0.334] 1
9 Grp2 B 0.161 (0.144,0.165] 2
10 Grp2 C 0.279 (0.27,0.291] 3
11 Grp2 D 0.146 (0.144,0.165] 4
12 Grp2 E 0.0809 [0.0809,0.102] 5
然后嘗試在創建它后基於“值”對因子重新排序,但順序似乎沒有改變。
df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.)),
Bin = fct_reorder(Bin, Value)) %>%
mutate(Order = labels(Bin)) %>%
ungroup()
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <chr>
1 Grp1 A 0.156 (0.144,0.161] 1
2 Grp1 B 0.0583 [0.0583,0.0754] 2
3 Grp1 C 0.181 (0.178,0.195] 3
4 Grp1 D 0.264 (0.246,0.264] 4
5 Grp1 E 0.0638 [0.0583,0.0754] 5
6 Grp1 F 0.214 (0.212,0.229] 6
7 Grp1 G 0.0640 [0.0583,0.0754] 7
8 Grp2 A 0.334 (0.312,0.334] 1
9 Grp2 B 0.161 (0.144,0.165] 2
10 Grp2 C 0.279 (0.27,0.291] 3
11 Grp2 D 0.146 (0.144,0.165] 4
12 Grp2 E 0.0809 [0.0809,0.102] 5
然后我在創建因子之前將數據排列在“價值”上,並得到了正確的順序。
df %>%
arrange(Group, Value) %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.))) %>%
mutate(Order = labels(Bin)) %>%
ungroup()
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <chr>
1 Grp1 B 0.0583 [0.0583,0.0754] 1
2 Grp1 E 0.0638 [0.0583,0.0754] 2
3 Grp1 G 0.0640 [0.0583,0.0754] 3
4 Grp1 A 0.156 (0.144,0.161] 4
5 Grp1 C 0.181 (0.178,0.195] 5
6 Grp1 F 0.214 (0.212,0.229] 6
7 Grp1 D 0.264 (0.246,0.264] 7
8 Grp2 E 0.0809 [0.0809,0.102] 1
9 Grp2 D 0.146 (0.144,0.165] 2
10 Grp2 B 0.161 (0.144,0.165] 3
11 Grp2 C 0.279 (0.27,0.291] 4
12 Grp2 A 0.334 (0.312,0.334] 5
那么首先,為什么fct_reorder
沒有做我想做的事? 其次,為什么“Grp1”中有 7 個值而“Grp2”中有 5 個值? 由於每組中重復的“Bin”值,難道不應該分別只有 5 和 4 嗎?
這是有序的levels
。 根據?fct_reorder
.x, .y - f 的水平被重新排序,以便 .fun(.x)(對於 fct_reorder())和 fun(.x, .y)(對於 fct_reorder2())的值按升序排列。
arrange
Bin
后,通過在刪除未使用的級別 ( droplevels
) 后轉換為integer
來創建“訂單”
library(dplyr)
library(forcats)
out <- df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.)),
Bin = fct_reorder(Bin, Value)) %>%
arrange(as.integer(Bin)) %>%
mutate(Order = as.integer(droplevels(Bin))) %>%
ungroup
out
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <int>
1 Grp1 B 0.0583 [0.0583,0.0754] 1
2 Grp1 E 0.0638 [0.0583,0.0754] 1
3 Grp1 G 0.0640 [0.0583,0.0754] 1
4 Grp1 A 0.156 (0.144,0.161] 2
5 Grp1 C 0.181 (0.178,0.195] 3
6 Grp1 F 0.214 (0.212,0.229] 4
7 Grp1 D 0.264 (0.246,0.264] 5
8 Grp2 E 0.0809 [0.0809,0.102] 1
9 Grp2 B 0.161 (0.144,0.165] 2
10 Grp2 D 0.146 (0.144,0.165] 2
11 Grp2 C 0.279 (0.27,0.291] 3
12 Grp2 A 0.334 (0.312,0.334] 4
或者使用match
with unique
df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.)),
Bin = fct_reorder(Bin, Value)) %>%
arrange(as.integer(Bin)) %>% mutate(Order = match(Bin, unique(Bin))) %>%
ungroup
# A tibble: 12 x 5
Group Ind Value Bin Order
<chr> <chr> <dbl> <fct> <int>
1 Grp1 B 0.0583 [0.0583,0.0754] 1
2 Grp1 E 0.0638 [0.0583,0.0754] 1
3 Grp1 G 0.0640 [0.0583,0.0754] 1
4 Grp1 A 0.156 (0.144,0.161] 2
5 Grp1 C 0.181 (0.178,0.195] 3
6 Grp1 F 0.214 (0.212,0.229] 4
7 Grp1 D 0.264 (0.246,0.264] 5
8 Grp2 E 0.0809 [0.0809,0.102] 1
9 Grp2 B 0.161 (0.144,0.165] 2
10 Grp2 D 0.146 (0.144,0.165] 2
11 Grp2 C 0.279 (0.27,0.291] 3
12 Grp2 A 0.334 (0.312,0.334] 4
關於fct_reorder
沒有完成任何事情,檢查 `step 之前和之后的levels
> tmp <- df %>%
group_by(Group) %>%
mutate(Bin = cut_interval(Value, n = nrow(.)))
> tmp %>% pull(Bin) %>% levels
[1] "[0.0583,0.0754]" "(0.0754,0.0925]" "(0.0925,0.11]" "(0.11,0.127]" "(0.127,0.144]" "(0.144,0.161]" "(0.161,0.178]" "(0.178,0.195]" "(0.195,0.212]"
[10] "(0.212,0.229]" "(0.229,0.246]" "(0.246,0.264]" "[0.0809,0.102]" "(0.102,0.123]" "(0.123,0.144]" "(0.144,0.165]" "(0.165,0.186]" "(0.186,0.207]"
[19] "(0.207,0.228]" "(0.228,0.249]" "(0.249,0.27]" "(0.27,0.291]" "(0.291,0.312]" "(0.312,0.334]"
> tmp %>% mutate(Bin = fct_reorder(Bin, Value)) %>% pull(Bin) %>% levels
[1] "[0.0583,0.0754]" "(0.144,0.161]" "(0.178,0.195]" "(0.212,0.229]" "(0.246,0.264]" "(0.0754,0.0925]" "(0.0925,0.11]" "(0.11,0.127]" "(0.127,0.144]"
[10] "(0.161,0.178]" "(0.195,0.212]" "(0.229,0.246]" "[0.0809,0.102]" "(0.102,0.123]" "(0.123,0.144]" "(0.144,0.165]" "(0.165,0.186]" "(0.186,0.207]"
[19] "(0.207,0.228]" "(0.228,0.249]" "(0.249,0.27]" "(0.27,0.291]" "(0.291,0.312]" "(0.312,0.334]"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.