[英]R: Create new column based list of values from a multiple columns
我想根据多个列中存在的列表中的任何值创建一个新列 (T/F)。 对于这个例子,我使用 mtcars 作为我的例子,在两列中搜索两个值,但我的实际挑战是许多列中有很多值。
我使用下面包含的filter_at()
有一个成功的过滤器,但我一直无法将该逻辑应用于 mutate:
# there are 7 cars with 6 cyl
mtcars %>%
filter(cyl == 6)
# there are 2 cars with 19.2 mpg, one with 6 cyl, one with 8
mtcars %>%
filter(mpg == 19.2)
# there are 8 rows with either.
# these are the rows I want as TRUE
mtcars %>%
filter(mpg == 19.2 | cyl == 6)
# set the cols to look at
mtcars_cols <- mtcars %>%
select(matches('^(mp|cy)')) %>% names()
# set the values to look at
mtcars_numbs <- c(19.2, 6)
# result is 8 vars with either value in either col.
# this is a successful filter of the data
out1 <- mtcars %>%
filter_at(vars(mtcars_cols), any_vars(
. %in% mtcars_numbs
)
)
# shows set with all 6 cyl, plus one 8cyl 21.9 mpg
out1 %>%
select(mpg, cyl)
# This attempts to apply the filter list to the cols,
# but I only get 6 rows as True
# I tried to change == to %in& but that results in an error
out2 <- mtcars %>%
mutate(
myset = rowSums(select(., mtcars_cols) == mtcars_numbs) > 0
)
# only 6 rows returned
out2 %>%
filter(myset == T)
我不确定为什么跳过这两行。 我认为可能是使用rowSums
以某种方式聚合了这两行。
如果我们要做相应的检查,使用map2
可能更好
library(dplyr)
library(purrr)
map2_df(mtcars_cols, mtcars_numbs, ~
mtcars %>%
filter(!! rlang::sym(.x) == .y)) %>%
distinct
注意:与浮点数进行比较 ( ==
) 可能会遇到麻烦,因为精度可能会有所不同并导致 FALSE
另外,请注意==
仅在lhs
和rhs
元素具有相同长度或rhs
向量的length
1 时才起作用(这里发生了回收)。 如果length
大于 1 且不等于 lhs 向量的长度,则回收将按列顺序进行比较。
我们可以rep
licate使长度相等,现在它应该工作
mtcars %>%
mutate(
myset = rowSums(select(., mtcars_cols) == mtcars_numbs[col(select(., mtcars_cols))]) > 0
) %>% pull(myset) %>% sum
#[1] 8
在上面的代码中,为了更好的理解使用了两次select
。 否则,我们也可以使用rep
mtcars %>%
mutate(
myset = rowSums(select(., mtcars_cols) == rep(mtcars_numbs, each = n())) > 0
) %>%
pull(myset) %>%
sum
#[1] 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.