I want to create a variable that flags whether one or more of multiple variables has a particular value.
week Mon Tues Weds Thurs Fri Sat
1 jon jon jon jon mary mary
2 jane jane jane jane jane jane
3 mary mary mary mary mary jane
I want to create a binary variable that flags for each week whether Mon, Weds, or Sat of that week == "jon" or "mary" Is there a way to do this without creating a long ifelse statement that checks each variable individually?
week Mon Tues Weds Thurs Fri Sat flag
1 jon jon jon jon mary mary 1
2 jane jane jane jane jane jane 0
3 mary mary mary mary mary jane 1
I tried
df %>%
rowwise() %>%
mutate(flag = +any(c_across(Mon, Weds, Sat)
%in% ("jon", "mary")) %>%
ungroup()
but I get an error
Error: Problem with `mutate()` input `flag`.
x unused arguments (Mon, Weds, Sat)
i Input `flag` is `+...`.
i The error occurred in row 1.
df %>%
mutate(flag = colSums(apply(cbind(Mon, Weds, Sat), 1, `%in%`, c("jon", "mary"))) > 0)
# week Mon Tues Weds Thurs Fri Sat flag
# 1 1 jon jon jon jon mary mary TRUE
# 2 2 jane jane jane jane jane jane FALSE
# 3 3 mary mary mary mary mary jane TRUE
I think the problem with across
is that it's trying to do something to each column, not a summary of sorts of all of them. Let's try purrr::pmap
insteadL
library(purrr)
df %>%
mutate(flag = pmap(list(Mon, Weds, Sat),
~ +any(unlist(...) %in% c("jon", "mary"))))
# week Mon Tues Weds Thurs Fri Sat flag
# 1 1 jon jon jon jon mary mary 1
# 2 2 jane jane jane jane jane jane 0
# 3 3 mary mary mary mary mary jane 1
A third (using your request for c_across
):
df %>%
rowwise() %>%
mutate(flag = +any(c_across(c(Mon, Weds, Sat)) %in% c("jon", "mary"))) %>%
ungroup()
# # A tibble: 3 x 8
# week Mon Tues Weds Thurs Fri Sat flag
# <int> <chr> <chr> <chr> <chr> <chr> <chr> <int>
# 1 1 jon jon jon jon mary mary 1
# 2 2 jane jane jane jane jane jane 0
# 3 3 mary mary mary mary mary jane 1
Instead of the rowwise
or looping over the rows, we can make it more efficient if we loop over the columns with map
and reduce
it
library(purrr)
library(dplyr)
df %>%
mutate(flag = map(select(., Mon, Weds, Sat), `%in%`, c("jon", "mary")) %>%
reduce(`|`) %>% `+`)
# week Mon Tues Weds Thurs Fri Sat flag
#1 1 jon jon jon jon mary mary 1
#2 2 jane jane jane jane jane jane 0
#3 3 mary mary mary mary mary jane 1
A corresponding option in base R
is lapply/Reduce
df$flag <- +(Reduce(`|`, lapply(df[c('Mon', 'Weds', 'Sat')],
`%in%`, c("jon", "mary"))))
df <- structure(list(week = 1:3, Mon = c("jon", "jane", "mary"), Tues = c("jon",
"jane", "mary"), Weds = c("jon", "jane", "mary"), Thurs = c("jon",
"jane", "mary"), Fri = c("mary", "jane", "mary"), Sat = c("mary",
"jane", "jane")), class = "data.frame", row.names = c(NA, -3L
))
Here is another base R option using rowSums
+ Reduce
df$flag <- +(rowSums(
Reduce(
`+`,
lapply(
c("jon", "mary"),
`==`,
df[c("Mon", "Weds", "Sat")]
)
)
) > 0)
such that
week Mon Tues Weds Thurs Fri Sat flag
1 1 jon jon jon jon mary mary 1
2 2 jane jane jane jane jane jane 0
3 3 mary mary mary mary mary jane 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.