Optimization of R Data.table combination with for loop function

Question

I have a 'Agency_Reference' table containing column 'agency_lookup', with 200 entries of strings as below :

alpha
beta
gamma etc..

I have a dataframe 'TEST' with a million rows containing a 'Campaign' column with entries such as :

Alpha_xt2010
alpha_xt2014
Beta_xt2016 etc..

i want to loop through for each entry in reference table and find which string is present within each campaign column entries and create a new agency_identifier column variable in table.

my current code is as below and is slow to execute. Requesting guidance on how to optimize the same. I would like to learn how to do it in the data.table way

 Agency_Reference <- data.frame(agency_lookup = c('alpha','beta','gamma','delta','zeta'))
 TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_'))
 TEST$agency_identifier <- 0
 for (agency_lookup in  as.vector(Agency_Reference$agency_lookup)) {
 TEST$Agency_identifier <- ifelse(grepl(tolower(agency_lookup), tolower(TEST$Campaign)),agency_lookup,TEST$Agency_identifier)}

Expected Output :

Campaign----Agency_identifier

alpha_xt123---alpha

ALPHA34----alpha

Beta_xyz_34----beta

BETa_testing----beta

code_delta_-----delta

Answer 1

Try

TEST <- data.frame(Campaign = c('alpha_xt123','ALPHA345','Beta_xyz_34','BETa_testing','code_delta_'))

pattern = tolower(c('alpha','Beta','gamma','delta','zeta'))

TEST$agency_identifier <- sub(pattern = paste0('.*(', paste(pattern, collapse = '|'), ').*'),
                              replacement = '\\1',
                              x = tolower(TEST$Campaign))

Answer 2

This will not answer your question per se, but from what I understand you want to dissect the Campaign column and do something with the values it provides.

Take a look at Tidy data , more specifically the part "Multiple variables stored in one column". I think you'll make some great progress using tidyr::separate . That way you don't have to use a for -loop.

Optimization of R Data.table combination with for loop function

Question

2 answers

solution1
1 ACCPTED 2016-07-28 10:49:25

solution2
0 2016-07-28 07:42:23

Optimization of R Data.table combination with for loop function

Question

2 answers

solution1 1 ACCPTED 2016-07-28 10:49:25

solution2 0 2016-07-28 07:42:23

solution1
1 ACCPTED 2016-07-28 10:49:25

solution2
0 2016-07-28 07:42:23