I would like to run a logical operation (multiple conditions) across many columns. I have written a query which is working fine. However, I want to shorten my code as I have to write several queries.
I have tried shortening the query using "any" and "brackets". However, the second query is running fine but giving me a different answer. Does "any" function work on multiple columns?
Here are my conditions -
Participate | B1 | B2 | B3 | B4 | B5 | Query1 | Query2 |
---|---|---|---|---|---|---|---|
3 | -1 | -1 | -1 | -1 | -1 | Noissue | Noissue |
1 | -1 | 1 | -1 | -1 | 1 | Noissue | Noissue |
1 | -1 | -1 | -1 | -1 | -1 | Issue | Noissue |
2 | -1 | 1 | 1 | -1 | 1 | Noissue | Noissue |
2 | 1 | 1 | 1 | 1 | -1 | Noissue | Noissue |
1 | -99 | -99 | -99 | -99 | -99 | Noissue | Noissue |
I appreciate if anyone help me on reducing the code lines using different functions.
mutate(Batch_v1,
case_when (
((Batch_v1$B1 == 1 | Batch_v1$B2 == 1 | Batch_v1$B3 == 1 | Batch_v1$B4 == 1 | Batch_v1$B5 == 1| Batch_v1$B6 == 1| Batch_v1$B7 == 1|Batch_v1$B8 == 1|Batch_v1$B9 == 1|Batch_v1$B10 == 1|Batch_v1$BOth == 1) &
Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
((Batch_v1$B1 == -99 | Batch_v1$B2 == -99 | Batch_v1$B3 == -99|Batch_v1$B4 == -99 |Batch_v1$B5 == -99|Batch_v1$B6 == -99|Batch_v1$B7 == -99|Batch_v1$B8 == 1|Batch_v1$B9 == -99|Batch_v1$B10 == -99|Batch_v1$BOth == -99) &
Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
Batch_v1$Participate ==3 ~ "Noissue",
TRUE ~ "Issue"))
mutate(Batch_v1,
case_when (
((any(Batch_v1[,2:6] == 1)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
((any(Batch_v1[,2:6] == -99)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
Batch_v1$Participate ==3 ~ "Noissue",
TRUE ~ "Issue"))
We could uses across
with case_when
library(dplyr)
df %>%
mutate(across(B2:B5, ~case_when(. == 1 & B1 <=2 ~ "Noissue",
. == -99 & B1 <=2 ~ "Noissue",
B1 == 3 ~ "Noissue",
TRUE ~ "issue")
)
)
Output:
Participate B1 B2 B3 B4 B5 Query1 Query2
<dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 3 -1 issue issue issue issue Noissue Noissue
2 1 -1 Noissue issue issue Noissue Noissue Noissue
3 1 -1 issue issue issue issue Issue Noissue
4 2 -1 Noissue Noissue issue Noissue Noissue Noissue
5 2 1 Noissue Noissue Noissue issue Noissue Noissue
6 1 -99 Noissue Noissue Noissue Noissue Noissue Noissue
data:
df <- structure(list(Participate = c(3, 1, 1, 2, 2, 1), B1 = c(-1,
-1, -1, -1, 1, -99), B2 = c(-1, 1, -1, 1, 1, -99), B3 = c(-1,
-1, -1, 1, 1, -99), B4 = c(-1, -1, -1, -1, 1, -99), B5 = c(-1,
1, -1, 1, -1, -99), Query1 = c("Noissue", "Noissue", "Issue",
"Noissue", "Noissue", "Noissue"), Query2 = c("Noissue", "Noissue",
"Noissue", "Noissue", "Noissue", "Noissue")), problems = structure(list(
row = 6L, col = "Query2", expected = "", actual = "embedded null",
file = "'test'"), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L))
Whenever we have to use logical conditions rowwise across many columns, two main approaches should usually be considered. These obviate the need for rowwise()
and Reduce()
in the alternative with lapply/map %>% Reduce/reduce
, or complex case_when()
statements.
-1) rowSums(condition)
-2) if_any() / if_all()
This question is most suited for a solution with if_any()
.
With if_any()
Batch_v1 %>% mutate(query3 = ifelse(if_any(B2:B5, ~.x %in% c(-99, 1)) & B1<=2,
"Noissue",
"Issue"))
With rowSums()
Batch_v1 %>% mutate(query3 = ifelse(rowSums(across(B2:B5, ~.x %in% c(-99, 1)))>0 & B1<=2,
"Noissue",
"Issue"))
Output
# A tibble: 6 x 9
Participate B1 B2 B3 B4 B5 Query1 Query2 query3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 3 -1 -1 -1 -1 -1 Noissue Noissue Issue
2 1 -1 1 -1 -1 1 Noissue Noissue Noissue
3 1 -1 -1 -1 -1 -1 Issue Noissue Issue
4 2 -1 1 1 -1 1 Noissue Noissue Noissue
5 2 1 1 1 1 -1 Noissue Noissue Noissue
6 1 -99 -99 -99 -99 -99 Noissue Noissue Noissue
There are some good answers to similar questions in here:
Rowwise logical operations with mutate() and filter() in R and here:
R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods?
You could use
library(dplyr)
Batch_v1 %>%
rowwise() %>%
mutate(
Query3 = case_when(
any(B1:B5 == 1) & Participate %in% c(1,2,-99) ~ "Noissue",
any(B1:B5 == -99) & Participate %in% c(1,2,-99) ~ "Noissue",
Participate == 3 ~ "Noissue",
TRUE ~ "Issue"
)
)
which returns
# A tibble: 6 x 9
# Rowwise:
Participate B1 B2 B3 B4 B5 Query1 Query2 Query3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
1 3 -1 -1 -1 -1 -1 Noissue Noissue Noissue
2 1 -1 1 -1 -1 1 Noissue Noissue Noissue
3 1 -1 -1 -1 -1 -1 Issue Noissue Issue
4 2 -1 1 1 -1 1 Noissue Noissue Noissue
5 2 1 1 1 1 -1 Noissue Noissue Noissue
6 1 -99 -99 -99 -99 -99 Noissue Noissue Noissue
The main problem with your second code is the function
any(Batch_v1[,2:6] == 1)
Let's take a look at
Batch_v1[,2:6] == 1
#> B1 B2 B3 B4 B5
#> [1,] FALSE FALSE FALSE FALSE FALSE
#> [2,] FALSE TRUE FALSE FALSE TRUE
#> [3,] FALSE FALSE FALSE FALSE FALSE
#> [4,] FALSE TRUE TRUE FALSE TRUE
#> [5,] TRUE TRUE TRUE TRUE FALSE
#> [6,] FALSE FALSE FALSE FALSE FALSE
So Batch_v1[,2:6] == 1
returns a data.frame of booleans. Applying any
on this data.frame returns TRUE
if any
of the values inside this data.frame is TRUE
. That's clearly not your desired behaviour. Using rowwise()
forces any
to be applied... well... per row.
Note: Inside a tidyverse
-pipe, you don't want to use Batch_v1$B1
if you are refering on the current object you are working with. Batch_v1$B1
for example refers to the original Batch_v1
, without any transformations done. In this case, there is no real difference, but you shouldn't rely on this in general.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.