简体   繁体   中英

Conditionally replace values across multiple columns based on string match in a separate column

I'm trying to conditionally replace values in multiple columns based on a string match in a different column but I'd like to be able to do so in a single line of code using the across() function but I keep getting errors that don't quite make sense to me. I feel like this is probably a simple solution so if anyone could point me in the right direction, that would be fantastic!

df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
               "total" = c(34, 56, 75, 89, 21, 56),
               "group_a" = c(30, 26, 45, 60, 3, 46),
               "group_b" = c(4, 30, 30, 29, 18, 10))

# working but not concise
df %>%
  mutate(total = ifelse(str_detect(type, "Park"), NA, total),
         group_a = ifelse(str_detect(type, "Park"), NA, group_a),
         group_b = ifelse(str_detect(type, "Park"), NA, group_b))

  
# concise but not working
df %>% mutate(across(total, group_a, group_b), ifelse(str_detect(type, "Park"), NA, .))

Update

We got a solution that works with my dummy dataset but is not working with my real data, so I am going to share a small snippet of my real data frame with the numbers changed and organization names hidden. When I run this line of code ( df %>% mutate(across(c(Attempts, Canvasses, Completes)), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)) ) on these data, I get the following error message:

Error: Problem with mutate() input ..2 . x Input ..2 must be a vector, not a formula object. i Input ..2 is ~ifelse(str_detect(long_name, "park-cemetery"), NA, .) .

This a small sample of the data that produces this error:

df <- structure(list(Org = c("OrgName", "OrgName", "OrgName", "OrgName", 
"OrgName", "OrgName", "OrgName", "OrgName", "OrgName", "OrgName"
), nCode = c("M34", "R36", "R46", "X29", "M31", "K39", "Q12", 
"Q39", "X41", "K27"), Attempts = c(100, 100, 100, 100, 100, 100, 
100, 100, 100, 100), Canvasses = c(80, 80, 80, 80, 80, 80, 80, 
80, 80, 80), Completes = c(50, 50, 50, 50, 50, 50, 50, 50, 50, 
50), van_nocc_id = c(999, 999, 999, 999, 999, 999, 999, 999, 
999, 999), van_name = c("M-Upper West Side", "SI-Rosebank", "SI-Tottenville", 
"BX-park-cemetery-etc-Bronx", "M-Stuyvesant Town-Cooper Village", 
"BK-Kensington", "Q-Broad Channel", "Q-Lindenwood", "BX-Wakefield", 
"BK-East New York"), boro_short = c("M", "SI", "SI", "BX", "M", 
"BK", "Q", "Q", "BX", "BK"), long_name = c("Upper West Side", 
"Rosebank", "Tottenville", "park-cemetery-etc-Bronx", "Stuyvesant Town-Cooper Village", 
"Kensington", "Broad Channel", "Lindenwood", "Wakefield", "East New York"
)), row.names = c(NA, -10L), class = "data.frame")

Final update

The curse of the misplaced closing bracket. Thanks to everyone for your help... the correct solution was df %>% mutate(across(c(Attempts, Canvasses, Completes), ~ifelse(str_detect(long_name, "park-cemetery"), NA, .)))

If you use the newly introduced function across (which is the correct way to approach this task), you have to specify inside across itself the function you want to apply. In this case the function ifelse(...) has to be a purrr-style lambda (so starting with ~ ). Check out across documentation and look for the arguments .cols and .fns .

df %>% 
  mutate(across(c(total, group_a, group_b), ~ifelse(str_detect(type, "Park"), NA, .)))

Output

#           type total group_a group_b
# 1         Park    NA      NA      NA
# 2 Neighborhood    56      26      30
# 3      Airport    75      45      30
# 4         Park    NA      NA      NA
# 5 Neighborhood    21       3      18
# 6 Neighborhood    56      46      10

Here a data.table solution.

require(data.table)
df <- data.frame("type" = c("Park", "Neighborhood", "Airport", "Park", "Neighborhood", "Neighborhood"),
               "total" = c(34, 56, 75, 89, 21, 56),
               "group_a" = c(30, 26, 45, 60, 3, 46),
               "group_b" = c(4, 30, 30, 29, 18, 10))

setDT(df)
df[type == "Park", c("total", "group_a", "group_b") := NA]

Update: that didn't take long to figure out: Just needed to place the columns in a vector:

# concise AND working!
df %>% mutate(across(c(total, group_a, group_b)), ifelse(str_detect(type, "Park"), NA, .))

I had tried this initially but placed the columns in quotes... don't do that:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM