[英]Change factor levels in R using a variable for BOTH factor name AND level order in a data frame
I have a large data frame 1 with a lot of columns that are factors. 我有一个很大的数据框1,其中有很多列是要考虑的因素。 I want to change factor level order for each factor.
我想更改每个因子的因子水平顺序。
I have a lookup data frame 2 for the right factor level orders. 我有一个正确的因子水平顺序的查找数据框2。 This means I can refer to the lookup data frame using a variable for the factor.
这意味着我可以使用变量作为因子来引用查询数据帧。 I can grab the order and put it in a different variable.
我可以抓取订单并将其放在其他变量中。 So far so good.
到现在为止还挺好。
Simplified example: 简化示例:
d = tibble(
size = c('small','small','big', NA)
)
d$size = as.factor(d$size)
levels(d$size) # Not what I want.
proper.order = c('small', 'big') # this comes from somewhere else
I can use proper.order
to change one column in d. 我可以使用
proper.order
更改d中的一列。
d$size = factor(d$size, levels = proper.order)
levels(d$size) # What I want.
I want to refer to the column name ( size
) using a variable. 我想使用变量引用列名(
size
)。
This doesn't work: 这不起作用:
my.column = 'size'
d[names(d) == my.column] = factor(d[names(d) == my.column], levels = proper.order, exclude = NULL)
levels(d$size) # What I want.
d # Not what I want.
I expect to see the factor reordered. 我希望看到因素重新排序。 This happens.
有时候是这样的。 I expect the factor to keep its values (obviously).
我希望该因素能够保持其价值(显然)。 They are all set to NA.
它们都设置为NA。
I suspect this is because d[names(d) == my.column]
is a tibble, not a factor. 我怀疑这是因为
d[names(d) == my.column]
是一个小问题,而不是一个因素。 But then why do factor levels change? 但是,为什么因子水平发生变化? And how can I reach into the tibble and grab the factor?
而我该如何深入讨论并抓住因素呢?
For multiple columns, we can specify in mutate_at
对于多列,我们可以在
mutate_at
指定
library(dplyr)
d %>%
mutate_at(vars(my.column),
list(~ factor(., levels = proper.order, exclude = NULL)))
Or with fct_relevel
from forcats
或者使用
fct_relevel
的forcats
library(forcats)
d %>%
mutate_at(vars(my.column), list(~ fct_relevel(., proper.order)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.