![](/img/trans.png)
[英]Program R create histogram of data from specified levels of multiple factors
[英]Recode levels of multiple factors to specified range
我有以下數據框:
library(tidyverse)
df <- tibble(a = c(1, 2, 3, 4, 5),
b = c("Y", "N", "N", "Y", "N"),
c = c("A", "B", "C", "A", "B"))
df <- df %>%
mutate_if(is.character, funs(as.factor))
df
的輸出:
a b c
<dbl> <fct> <fct>
1 1 Y A
2 2 N B
3 3 N C
4 4 Y A
5 5 N B
我想將所有因子( b
和c
變量)級別重新編碼為整數:如果一個因子只有兩個級別,則應將其重新編碼為 {0, 1},否則為 {1, 2, 3, ...} 級別。 所以輸出應該是:
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
我可以單獨(一個一個)重新編碼變量,但不知道是否有更方便的方法。
df <- df %>%
mutate_if(
is.character,
function(x) {
out <- as.integer(as.factor(x))
if (n_distinct(out) == 2) out <- out - 1L
out
}
)
df
# a b c
# <dbl> <int> <int>
# 1 1 1 1
# 2 2 0 2
# 3 3 0 3
# 4 4 1 1
# 5 5 0 2
這是否有效:
> library(dplyr)
> df %>% mutate(b_fac = match(b,unique(b)) - 1, c_fac = match(c, unique(c))) %>%
+ mutate(b_fac = ifelse(b_fac == 1, 0, 1)) %>% mutate(b_fac = as.factor(b_fac), c_fac = as.factor(c_fac)) %>%
+ select(-2,-3) %>% rename(b = b_fac, c = c_fac)
# A tibble: 5 x 3
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
>
一種dplyr
選項可能是:
df %>%
mutate(across(where(is.factor),
~ if(n_distinct(.) == 2) factor(., labels = 0:1) else factor(., labels = 1:n_distinct(.))))
a b c
<dbl> <fct> <fct>
1 1 1 1
2 2 0 2
3 3 0 3
4 4 1 1
5 5 0 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.