df <- data.frame (address.1.line = c("apartment 5", "25 spring street", "nice house"), address.2.line = c("london", "new york", "apartment 2"), address.3.line = c("", "", "paris"))
I'm trying to make a function that returns a new column in a data frame. The column should be a dummy variable attached to the original data frame indicating whether any of 3 address-line variables contain a string (or selection of strings).
Eg, in the example above, I want df to have a new variable called "Apartment_dummy" indicating the presence of the string fragment "apartment" in any of the three address lines---so it will take 1 in rows 1 and 3, and zero in row 0. The function needs to take 2 arguments, therefore: the name of the new dummy variable to be created, and the corresponding string fragment that needs to be detected in the address variables.
I'd tried the following. It will return a dummy, but won't give the new variable the right name. Also, I feel like there must be a way to do it in a single step. Any ideas? Many thanks!
library(tidyverse)
premises_dummy <- function(varname = NULL, strings = NULL) {
df %<>% mutate_at(.funs = funs(flagA = str_detect(., strings)), .vars = vars(ends_with(".line"))) %>%
mutate(varname = ifelse(rowSums(select(., contains("flagA"))) > 0, 1, 0))
return(df)
}
df <- premises_dummy(varname = 'Apartment_dummy', strings = 'apartment')
A quick data.table
solution to it:
library(data.table)
dt <- data.table(df)
search_string <- "apartment"
dt[like(address.1.line, search_string)|
like(address.2.line, search_string)|
like(address.3.line, search_string), paste0(search_string,".Dummy") := 1]
dt[is.na(get(paste0(search_string,".Dummy"))), paste0(search_string,".Dummy") := 0]
A tidyverse
option using tidyr::unite
and stringr::str_detect
library(tidyverse)
df %>%
unite(tmp, remove = F) %>%
mutate(Apartment_dummy = +str_detect(tmp, "apartment")) %>%
select(-tmp)
# address.1.line address.2.line address.3.line Apartment_dummy
#1 apartment 5 london 1
#2 25 spring street new york 0
#3 nice house apartment 2 paris 1
A base R solution :
cols = endsWith(names(df),"line")
df['Apartment_dummy'] = as.integer(grepl('apartment',do.call(paste,df[cols])))
Now we can write a function that even considers the data to be used ie,data bein an argument.
premises_dummy=function(varname,strings){
cols = endsWith(names(df),"line")
df[varname]= as.integer(grepl(strings,do.call(paste,df[cols])))
df
}
premises_dummy(varname = 'Apartment_dummy', strings = 'apartment')
address.1.line address.2.line address.3.line Apartment_dummy
1 apartment 5 london 1
2 25 spring street new york 0
3 nice house apartment 2 paris 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.