简体   繁体   English

使用“any”函数跨多列的逻辑函数

[英]Logical function across multiple columns using “any” function

I would like to run a logical operation (multiple conditions) across many columns.我想在许多列中运行逻辑操作(多个条件)。 I have written a query which is working fine.我写了一个工作正常的查询。 However, I want to shorten my code as I have to write several queries.但是,我想缩短我的代码,因为我必须编写几个查询。

I have tried shortening the query using "any" and "brackets".我尝试使用“任何”和“括号”来缩短查询。 However, the second query is running fine but giving me a different answer.但是,第二个查询运行良好,但给了我不同的答案。 Does "any" function work on multiple columns? “任何”功能是否适用于多列?

Here are my conditions -这是我的条件 -

  1. any of the column (B2 to B5) has 1 & B1 <=2, then "Noissue"任何一列(B2 到 B5)都有 1 & B1 <=2,然后是“Noissue”
  2. any of the column (B2 to B5) has -99 & B1 <=2, then "Noissue"任何一列(B2 到 B5)都有 -99 & B1 <=2,然后是“Noissue”
  3. B1 ==3, then "Noissue" B1 ==3,然后“Noissue”
  4. Rest is all issue休息是一切
Participate参加 B1 B1 B2 B2 B3 B3 B4 B4 B5 B5 Query1查询1 Query2查询2
3 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Noissue没有任何问题 Noissue没有任何问题
1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 Noissue没有任何问题 Noissue没有任何问题
1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Issue问题 Noissue没有任何问题
2 2 -1 -1 1 1 1 1 -1 -1 1 1 Noissue没有任何问题 Noissue没有任何问题
2 2 1 1 1 1 1 1 1 1 -1 -1 Noissue没有任何问题 Noissue没有任何问题
1 1 -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 Noissue没有任何问题 Noissue没有任何问题

I appreciate if anyone help me on reducing the code lines using different functions.如果有人帮助我减少使用不同功能的代码行,我将不胜感激。

 mutate(Batch_v1, 
               case_when (
                 ((Batch_v1$B1 == 1 |  Batch_v1$B2 == 1 | Batch_v1$B3 == 1 | Batch_v1$B4 == 1 | Batch_v1$B5 == 1| Batch_v1$B6 == 1| Batch_v1$B7 == 1|Batch_v1$B8 == 1|Batch_v1$B9 == 1|Batch_v1$B10 == 1|Batch_v1$BOth == 1) & 
                    Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
                 ((Batch_v1$B1 == -99 |  Batch_v1$B2 == -99 | Batch_v1$B3 == -99|Batch_v1$B4 == -99 |Batch_v1$B5 == -99|Batch_v1$B6 == -99|Batch_v1$B7 == -99|Batch_v1$B8 == 1|Batch_v1$B9 == -99|Batch_v1$B10 == -99|Batch_v1$BOth == -99) & 
                    Batch_v1$Participate %in% c(1,2,-99))~"Noissue",
                 Batch_v1$Participate ==3 ~ "Noissue",
                 TRUE ~ "Issue"))





mutate(Batch_v1, 
   case_when (
     ((any(Batch_v1[,2:6] == 1)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
     ((any(Batch_v1[,2:6] == -99)) & Batch_v1$Participate %in% c(1,2,-99))~ "Noissue",
     Batch_v1$Participate ==3 ~ "Noissue",
     TRUE ~ "Issue"))

We could uses across with case_when我们可以使用acrosscase_when

library(dplyr)
df %>% 
    mutate(across(B2:B5, ~case_when(. == 1 & B1 <=2 ~ "Noissue",
                                    . == -99 & B1 <=2 ~ "Noissue",
                                    B1 == 3 ~ "Noissue",
                                    TRUE ~ "issue")
                  )
           )

Output:输出:

  Participate    B1 B2      B3      B4      B5      Query1  Query2 
        <dbl> <dbl> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1           3    -1 issue   issue   issue   issue   Noissue Noissue
2           1    -1 Noissue issue   issue   Noissue Noissue Noissue
3           1    -1 issue   issue   issue   issue   Issue   Noissue
4           2    -1 Noissue Noissue issue   Noissue Noissue Noissue
5           2     1 Noissue Noissue Noissue issue   Noissue Noissue
6           1   -99 Noissue Noissue Noissue Noissue Noissue Noissue

data:数据:

df <- structure(list(Participate = c(3, 1, 1, 2, 2, 1), B1 = c(-1, 
-1, -1, -1, 1, -99), B2 = c(-1, 1, -1, 1, 1, -99), B3 = c(-1, 
-1, -1, 1, 1, -99), B4 = c(-1, -1, -1, -1, 1, -99), B5 = c(-1, 
1, -1, 1, -1, -99), Query1 = c("Noissue", "Noissue", "Issue", 
"Noissue", "Noissue", "Noissue"), Query2 = c("Noissue", "Noissue", 
"Noissue", "Noissue", "Noissue", "Noissue")), problems = structure(list(
row = 6L, col = "Query2", expected = "", actual = "embedded null", 
file = "'test'"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("spec_tbl_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -6L))

Whenever we have to use logical conditions rowwise across many columns, two main approaches should usually be considered.每当我们必须在许多列中按行使用逻辑条件时,通常应该考虑两种主要方法。 These obviate the need for rowwise() and Reduce() in the alternative with lapply/map %>% Reduce/reduce , or complex case_when() statements.这些通过lapply/map %>% Reduce/reduce或复杂的case_when()语句消除了对rowwise()Reduce()的需要。

-1) rowSums(condition) -1) rowSums(condition)
-2) if_any() / if_all() -2) if_any() / if_all()

This question is most suited for a solution with if_any() .这个问题最适合使用if_any()的解决方案。

With if_any()使用if_any()

Batch_v1 %>% mutate(query3 = ifelse(if_any(B2:B5, ~.x %in% c(-99, 1)) & B1<=2,
              "Noissue",
              "Issue"))

With rowSums()使用rowSums()

Batch_v1 %>% mutate(query3 = ifelse(rowSums(across(B2:B5, ~.x %in% c(-99, 1)))>0 & B1<=2,
                              "Noissue",
                              "Issue"))

Output输出

# A tibble: 6 x 9
  Participate    B1    B2    B3    B4    B5 Query1  Query2  query3 
        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr>   <chr>  
1           3    -1    -1    -1    -1    -1 Noissue Noissue Issue  
2           1    -1     1    -1    -1     1 Noissue Noissue Noissue
3           1    -1    -1    -1    -1    -1 Issue   Noissue Issue  
4           2    -1     1     1    -1     1 Noissue Noissue Noissue
5           2     1     1     1     1    -1 Noissue Noissue Noissue
6           1   -99   -99   -99   -99   -99 Noissue Noissue Noissue

There are some good answers to similar questions in here:这里有一些类似问题的很好的答案:
Rowwise logical operations with mutate() and filter() in R and here: 在 R和这里使用 mutate() 和 filter() 进行行逻辑运算
R - Remove rows from dataframe that contain only zeros in numeric columns, base R and pipe-friendly methods? R - 从数据框中删除在数字列中仅包含零的行、基本 R 和管道友好方法?

You could use你可以用

library(dplyr)

Batch_v1 %>% 
  rowwise() %>%
  mutate(
    Query3 = case_when(
      any(B1:B5 == 1)   & Participate %in% c(1,2,-99) ~ "Noissue",
      any(B1:B5 == -99) & Participate %in% c(1,2,-99) ~ "Noissue",
      Participate == 3                                ~ "Noissue",
      TRUE                                            ~ "Issue"
      )
    )

which returns返回

# A tibble: 6 x 9
# Rowwise: 
  Participate    B1    B2    B3    B4    B5 Query1  Query2  Query3 
        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr>   <chr>  
1           3    -1    -1    -1    -1    -1 Noissue Noissue Noissue
2           1    -1     1    -1    -1     1 Noissue Noissue Noissue
3           1    -1    -1    -1    -1    -1 Issue   Noissue Issue  
4           2    -1     1     1    -1     1 Noissue Noissue Noissue
5           2     1     1     1     1    -1 Noissue Noissue Noissue
6           1   -99   -99   -99   -99   -99 Noissue Noissue Noissue

The main problem with your second code is the function你的第二个代码的主要问题是函数

any(Batch_v1[,2:6] == 1)

Let's take a look at让我们来看看

Batch_v1[,2:6] == 1

#>         B1    B2    B3    B4    B5
#> [1,] FALSE FALSE FALSE FALSE FALSE
#> [2,] FALSE  TRUE FALSE FALSE  TRUE
#> [3,] FALSE FALSE FALSE FALSE FALSE
#> [4,] FALSE  TRUE  TRUE FALSE  TRUE
#> [5,]  TRUE  TRUE  TRUE  TRUE FALSE
#> [6,] FALSE FALSE FALSE FALSE FALSE

So Batch_v1[,2:6] == 1 returns a data.frame of booleans.所以Batch_v1[,2:6] == 1返回一个布尔值的 data.frame。 Applying any on this data.frame returns TRUE if any of the values inside this data.frame is TRUE .如果此 data.frame 中的any值为TRUE则在此 data.frame 上应用any将返回TRUE That's clearly not your desired behaviour.这显然不是您想要的行为。 Using rowwise() forces any to be applied... well... per row.使用rowwise()强制any应用......好吧......每行。

Note: Inside a tidyverse -pipe, you don't want to use Batch_v1$B1 if you are refering on the current object you are working with.注意:tidyverse -pipe 中,如果您正在使用的当前对象上引用,则不希望使用Batch_v1$B1 Batch_v1$B1 for example refers to the original Batch_v1 , without any transformations done.例如, Batch_v1$B1指的是原始Batch_v1 ,没有进行任何转换。 In this case, there is no real difference, but you shouldn't rely on this in general.在这种情况下,没有真正的区别,但通常不应依赖于此。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM