简体   繁体   中英

How can I check efficiently check variables for a particular value in R and flag rows containing it?

I want to create a variable that flags whether one or more of multiple variables has a particular value.

week  Mon  Tues  Weds  Thurs  Fri  Sat
1     jon  jon   jon   jon    mary mary
2     jane jane  jane  jane   jane jane
3     mary mary  mary  mary   mary jane

I want to create a binary variable that flags for each week whether Mon, Weds, or Sat of that week == "jon" or "mary" Is there a way to do this without creating a long ifelse statement that checks each variable individually?

week  Mon  Tues  Weds  Thurs  Fri  Sat  flag
1     jon  jon   jon   jon    mary mary 1
2     jane jane  jane  jane   jane jane 0
3     mary mary  mary  mary   mary jane 1

I tried

df %>%
  rowwise() %>%
  mutate(flag = +any(c_across(Mon, Weds, Sat)
  %in% ("jon", "mary")) %>%
  ungroup()

but I get an error

Error: Problem with `mutate()` input `flag`.
x unused arguments (Mon, Weds, Sat)
i Input `flag` is `+...`.
i The error occurred in row 1.
df %>%
  mutate(flag = colSums(apply(cbind(Mon, Weds, Sat), 1, `%in%`, c("jon", "mary"))) > 0)
#   week  Mon Tues Weds Thurs  Fri  Sat  flag
# 1    1  jon  jon  jon   jon mary mary  TRUE
# 2    2 jane jane jane  jane jane jane FALSE
# 3    3 mary mary mary  mary mary jane  TRUE

I think the problem with across is that it's trying to do something to each column, not a summary of sorts of all of them. Let's try purrr::pmap insteadL

library(purrr)
df %>%
  mutate(flag = pmap(list(Mon, Weds, Sat),
                     ~ +any(unlist(...) %in% c("jon", "mary"))))
#   week  Mon Tues Weds Thurs  Fri  Sat flag
# 1    1  jon  jon  jon   jon mary mary    1
# 2    2 jane jane jane  jane jane jane    0
# 3    3 mary mary mary  mary mary jane    1

A third (using your request for c_across ):

df %>%
  rowwise() %>%
  mutate(flag = +any(c_across(c(Mon, Weds, Sat)) %in% c("jon", "mary"))) %>%
  ungroup()
# # A tibble: 3 x 8
#    week Mon   Tues  Weds  Thurs Fri   Sat    flag
#   <int> <chr> <chr> <chr> <chr> <chr> <chr> <int>
# 1     1 jon   jon   jon   jon   mary  mary      1
# 2     2 jane  jane  jane  jane  jane  jane      0
# 3     3 mary  mary  mary  mary  mary  jane      1

Instead of the rowwise or looping over the rows, we can make it more efficient if we loop over the columns with map and reduce it

library(purrr)
library(dplyr)
df %>%
     mutate(flag = map(select(., Mon, Weds, Sat), `%in%`, c("jon", "mary")) %>%
          reduce(`|`) %>% `+`)
#  week  Mon Tues Weds Thurs  Fri  Sat flag
#1    1  jon  jon  jon   jon mary mary    1
#2    2 jane jane jane  jane jane jane    0
#3    3 mary mary mary  mary mary jane    1

A corresponding option in base R is lapply/Reduce

df$flag <- +(Reduce(`|`, lapply(df[c('Mon', 'Weds', 'Sat')],
          `%in%`, c("jon", "mary"))))

data

df <- structure(list(week = 1:3, Mon = c("jon", "jane", "mary"), Tues = c("jon", 
"jane", "mary"), Weds = c("jon", "jane", "mary"), Thurs = c("jon", 
"jane", "mary"), Fri = c("mary", "jane", "mary"), Sat = c("mary", 
"jane", "jane")), class = "data.frame", row.names = c(NA, -3L
))

Here is another base R option using rowSums + Reduce

df$flag <- +(rowSums(
  Reduce(
    `+`,
    lapply(
      c("jon", "mary"),
      `==`,
      df[c("Mon", "Weds", "Sat")]
    )
  )
) > 0)

such that

  week  Mon Tues Weds Thurs  Fri  Sat flag
1    1  jon  jon  jon   jon mary mary    1
2    2 jane jane jane  jane jane jane    0
3    3 mary mary mary  mary mary jane    1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM