[英]Logical function across multiple columns using “any” function
我想在许多列中运行逻辑操作(多个条件)。 我写了一个工作正常的查询。 但是,我想缩短我的代码,因为我必须编写几个查询。
我尝试使用“任何”和“括号”来缩短查询。 但是,第二个查询运行良好,但给了我不同的答案。 “任何”功能是否适用于多列?
这是我的条件 -
参加 | B1 | B2 | B3 | B4 | B5 | 查询1 | 查询2 |
---|---|---|---|---|---|---|---|
3 | -1 | -1 | -1 | -1 | -1 | 没有任何问题 | 没有任何问题 |
1 | -1 | 1 | -1 | -1 | 1 | 没有任何问题 | 没有任何问题 |
1 | -1 | -1 | -1 | -1 | -1 | 问题 | 没有任何问题 |
2 | -1 | 1 | 1 | -1 | 1 | 没有任何问题 | 没有任何问题 |
2 | 1 | 1 | 1 | 1 | -1 | 没有任何问题 | 没有任何问题 |
1 | -99 | -99 | -99 | -99 | -99 | 没有任何问题 | 没有任何问题 |
如果有人帮助我减少使用不同功能的代码行,我将不胜感激。
mutate(Batch_v1,
case_when (
((Batch_v1$B1 == 1 | Batch_v1$B2 == 1 | Batch_v1$B3 == 1 | Batch_v1$B4 == 1 | Batch_v1$B5 == 1| Batch_v1$B6 == 1| Batch_v1$B7 == 1|Batch_v1$B8 == 1|Batch_v1$B9 == 1|Batch_v1$B10 == 1|Batch_v1$BOth == 1) &
Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
((Batch_v1$B1 == -99 | Batch_v1$B2 == -99 | Batch_v1$B3 == -99|Batch_v1$B4 == -99 |Batch_v1$B5 == -99|Batch_v1$B6 == -99|Batch_v1$B7 == -99|Batch_v1$B8 == 1|Batch_v1$B9 == -99|Batch_v1$B10 == -99|Batch_v1$BOth == -99) &
Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
Batch_v1$Participate ==3 ~ "Noissue",
TRUE ~ "Issue"))
mutate(Batch_v1,
case_when (
((any(Batch_v1[,2:6] == 1)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
((any(Batch_v1[,2:6] == -99)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
Batch_v1$Participate ==3 ~ "Noissue",
TRUE ~ "Issue"))
我们可以使用across
与case_when
library(dplyr)
df %>%
mutate(across(B2:B5, ~case_when(. == 1 & B1 <=2 ~ "Noissue",
. == -99 & B1 <=2 ~ "Noissue",
B1 == 3 ~ "Noissue",
TRUE ~ "issue")
)
)
输出:
Participate B1 B2 B3 B4 B5 Query1 Query2
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 3 -1 issue issue issue issue Noissue Noissue
2 1 -1 Noissue issue issue Noissue Noissue Noissue
3 1 -1 issue issue issue issue Issue Noissue
4 2 -1 Noissue Noissue issue Noissue Noissue Noissue
5 2 1 Noissue Noissue Noissue issue Noissue Noissue
6 1 -99 Noissue Noissue Noissue Noissue Noissue Noissue
数据:
df <- structure(list(Participate = c(3, 1, 1, 2, 2, 1), B1 = c(-1,
-1, -1, -1, 1, -99), B2 = c(-1, 1, -1, 1, 1, -99), B3 = c(-1,
-1, -1, 1, 1, -99), B4 = c(-1, -1, -1, -1, 1, -99), B5 = c(-1,
1, -1, 1, -1, -99), Query1 = c("Noissue", "Noissue", "Issue",
"Noissue", "Noissue", "Noissue"), Query2 = c("Noissue", "Noissue",
"Noissue", "Noissue", "Noissue", "Noissue")), problems = structure(list(
row = 6L, col = "Query2", expected = "", actual = "embedded null",
file = "'test'"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L))
每当我们必须在许多列中按行使用逻辑条件时,通常应该考虑两种主要方法。 这些通过lapply/map %>% Reduce/reduce
或复杂的case_when()
语句消除了对rowwise()
和Reduce()
的需要。
-1) rowSums(condition)
-2) if_any() / if_all()
这个问题最适合使用if_any()
的解决方案。
使用if_any()
Batch_v1 %>% mutate(query3 = ifelse(if_any(B2:B5, ~.x %in% c(-99, 1)) & B1<=2,
"Noissue",
"Issue"))
使用rowSums()
Batch_v1 %>% mutate(query3 = ifelse(rowSums(across(B2:B5, ~.x %in% c(-99, 1)))>0 & B1<=2,
"Noissue",
"Issue"))
输出
# A tibble: 6 x 9
Participate B1 B2 B3 B4 B5 Query1 Query2 query3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 3 -1 -1 -1 -1 -1 Noissue Noissue Issue
2 1 -1 1 -1 -1 1 Noissue Noissue Noissue
3 1 -1 -1 -1 -1 -1 Issue Noissue Issue
4 2 -1 1 1 -1 1 Noissue Noissue Noissue
5 2 1 1 1 1 -1 Noissue Noissue Noissue
6 1 -99 -99 -99 -99 -99 Noissue Noissue Noissue
这里有一些类似问题的很好的答案:
在 R和这里使用 mutate() 和 filter() 进行行逻辑运算:
R - 从数据框中删除在数字列中仅包含零的行、基本 R 和管道友好方法?
你可以用
library(dplyr)
Batch_v1 %>%
rowwise() %>%
mutate(
Query3 = case_when(
any(B1:B5 == 1) & Participate %in% c(1,2,-99) ~ "Noissue",
any(B1:B5 == -99) & Participate %in% c(1,2,-99) ~ "Noissue",
Participate == 3 ~ "Noissue",
TRUE ~ "Issue"
)
)
返回
# A tibble: 6 x 9
# Rowwise:
Participate B1 B2 B3 B4 B5 Query1 Query2 Query3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 3 -1 -1 -1 -1 -1 Noissue Noissue Noissue
2 1 -1 1 -1 -1 1 Noissue Noissue Noissue
3 1 -1 -1 -1 -1 -1 Issue Noissue Issue
4 2 -1 1 1 -1 1 Noissue Noissue Noissue
5 2 1 1 1 1 -1 Noissue Noissue Noissue
6 1 -99 -99 -99 -99 -99 Noissue Noissue Noissue
你的第二个代码的主要问题是函数
any(Batch_v1[,2:6] == 1)
让我们来看看
Batch_v1[,2:6] == 1
#> B1 B2 B3 B4 B5
#> [1,] FALSE FALSE FALSE FALSE FALSE
#> [2,] FALSE TRUE FALSE FALSE TRUE
#> [3,] FALSE FALSE FALSE FALSE FALSE
#> [4,] FALSE TRUE TRUE FALSE TRUE
#> [5,] TRUE TRUE TRUE TRUE FALSE
#> [6,] FALSE FALSE FALSE FALSE FALSE
所以Batch_v1[,2:6] == 1
返回一个布尔值的 data.frame。 如果此 data.frame 中的any
值为TRUE
则在此 data.frame 上应用any
将返回TRUE
。 这显然不是您想要的行为。 使用rowwise()
强制any
应用......好吧......每行。
注意:在tidyverse
-pipe 中,如果您正在使用的当前对象上引用,则不希望使用Batch_v1$B1
。 例如, Batch_v1$B1
指的是原始Batch_v1
,没有进行任何转换。 在这种情况下,没有真正的区别,但通常不应依赖于此。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.