简体   繁体   中英

Create a dummy to indicating presence of string fragment in any of multiple variables

df <- data.frame (address.1.line = c("apartment 5", "25 spring street", "nice house"), address.2.line = c("london", "new york", "apartment 2"), address.3.line = c("", "", "paris"))

I'm trying to make a function that returns a new column in a data frame. The column should be a dummy variable attached to the original data frame indicating whether any of 3 address-line variables contain a string (or selection of strings).

Eg, in the example above, I want df to have a new variable called "Apartment_dummy" indicating the presence of the string fragment "apartment" in any of the three address lines---so it will take 1 in rows 1 and 3, and zero in row 0. The function needs to take 2 arguments, therefore: the name of the new dummy variable to be created, and the corresponding string fragment that needs to be detected in the address variables.

I'd tried the following. It will return a dummy, but won't give the new variable the right name. Also, I feel like there must be a way to do it in a single step. Any ideas? Many thanks!

library(tidyverse)
premises_dummy <- function(varname = NULL, strings = NULL) {
df %<>%    mutate_at(.funs = funs(flagA = str_detect(., strings)), .vars = vars(ends_with(".line"))) %>% 
       mutate(varname = ifelse(rowSums(select(., contains("flagA"))) > 0, 1, 0))
return(df)
}

df <- premises_dummy(varname = 'Apartment_dummy', strings = 'apartment')

A quick data.table solution to it:

library(data.table)
dt <- data.table(df)
search_string <- "apartment"
dt[like(address.1.line, search_string)| 
   like(address.2.line, search_string)| 
   like(address.3.line, search_string), paste0(search_string,".Dummy") := 1]

dt[is.na(get(paste0(search_string,".Dummy"))), paste0(search_string,".Dummy") := 0]

A tidyverse option using tidyr::unite and stringr::str_detect

library(tidyverse)
df %>%
    unite(tmp, remove = F) %>%
    mutate(Apartment_dummy = +str_detect(tmp, "apartment")) %>%
    select(-tmp)
#    address.1.line address.2.line address.3.line Apartment_dummy
#1      apartment 5         london                              1
#2 25 spring street       new york                              0
#3       nice house    apartment 2          paris               1

A base R solution :

 cols = endsWith(names(df),"line")
 df['Apartment_dummy'] = as.integer(grepl('apartment',do.call(paste,df[cols])))

Now we can write a function that even considers the data to be used ie,data bein an argument.

premises_dummy=function(varname,strings){
   cols = endsWith(names(df),"line")
   df[varname]= as.integer(grepl(strings,do.call(paste,df[cols])))
   df
 }
 premises_dummy(varname = 'Apartment_dummy', strings = 'apartment')
    address.1.line address.2.line address.3.line Apartment_dummy
1      apartment 5         london                              1
2 25 spring street       new york                              0
3       nice house    apartment 2          paris               1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM