简体   繁体   中英

matching columns to an ID string and assigning a new value in a new column

I have this data:

USDfirms <- c("GOOG", "BABA" "0071.TW")
TWRfirms <- c("3231.TW")
JPYfirms <- c("7752.T")

I am trying to use the grepl function to create a new column. So if ticker in the df data matches the firm 3231.TW in one of the 3 above string vectors assign a value ( TWRmatch ) or if ticker matches the firm GOOG assign a value USDmatch etc.

The ticker values might not always be a perfect fit, ie the ticker 3231 is not an exact match for 3231.TW which is why I want to use grepl to ignore the .TW when matching.

df <- structure(list(symbol = c("3231.TW", "3231.TW", "3231.TW", "3231.TW", 
"7752.T", "7752.T", "7752.T", "7752.T", "GOOG", "GOOG", "GOOG", 
"GOOG", "BABA", "BABA", "BABA", "BABA"), ticker = c("3231", "3231", 
"3231", "3231", "7752", "7752", "7752", "7752", "GOOG", "GOOG", 
"GOOG", "GOOG", "BABA", "BABA", "BABA", "BABA"), country = c("TW", 
"TW", "TW", "TW", "T", "T", "T", "T", NA, NA, NA, NA, NA, NA, 
NA, NA), year = c(2017L, 2016L, 2015L, 2014L, 2018L, 2017L, 2016L, 
2015L, 2017L, 2016L, 2015L, 2014L, 2018L, 2017L, 2016L, 2015L
)), .Names = c("symbol", "ticker", "country", "year"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 123L, 124L, 125L, 126L, 127L, 128L, 
129L, 130L), class = "data.frame")

EDIT:

This function doesn´t seem to work

ifelse(grepl(USDfirms, df$ticker), "yes", "no")

I have also treid:

df$match <- ifelse(USDfirms %in% x$ticker, "yes", "no")

Which just gives me a yes for everything.

Not a perfect solution but a brute force method could be using a nested lapply / sapply solution. Here there is a double loop on every ticker goes over every element of firm_list and we check if it is present in any of the element in the list and if it is we extract the name of that list.

df$firms <- unlist(lapply(df$ticker, function(x)
        unlist(sapply(seq_along(firm_list), function(y) {
           if (any(grepl(x, unlist(firm_list[y])))) 
               names(firm_list[y])
})))) 

df

#     symbol ticker country year    firms
#1   3231.TW   3231      TW 2017 TWRfirms
#2   3231.TW   3231      TW 2016 TWRfirms
#3   3231.TW   3231      TW 2015 TWRfirms
#4   3231.TW   3231      TW 2014 TWRfirms
#5    7752.T   7752       T 2018 JPYfirms
#6    7752.T   7752       T 2017 JPYfirms
#7    7752.T   7752       T 2016 JPYfirms
#8    7752.T   7752       T 2015 JPYfirms
#123    GOOG   GOOG    <NA> 2017 USDfirms
#124    GOOG   GOOG    <NA> 2016 USDfirms
#125    GOOG   GOOG    <NA> 2015 USDfirms
#126    GOOG   GOOG    <NA> 2014 USDfirms
#127    BABA   BABA    <NA> 2018 USDfirms
#128    BABA   BABA    <NA> 2017 USDfirms
#129    BABA   BABA    <NA> 2016 USDfirms
#130    BABA   BABA    <NA> 2015 USDfirms

We move all the firms in a list, so that it is easy to check.

firm_list <- list(USDfirms = c("GOOG", "BABA", "0071.TW"), 
                  TWRfirms = c("3231.TW"), 
                  JPYfirms = c("7752.T"))

Or actually it would be much more convenient and shorter if we create a lookup data frame and then match and extract from it.

ref_df <- data.frame(firms = unlist(firm_list), 
           names = rep(names(firm_list), lengths(firm_list)))

df$firms <- ref_df$names[sapply(df$ticker, function(x) grep(x, ref_df$firms))]


df
#     symbol ticker country year    firms
#1   3231.TW   3231      TW 2017 TWRfirms
#2   3231.TW   3231      TW 2016 TWRfirms
#3   3231.TW   3231      TW 2015 TWRfirms
#4   3231.TW   3231      TW 2014 TWRfirms
#5    7752.T   7752       T 2018 JPYfirms
#6    7752.T   7752       T 2017 JPYfirms
#7    7752.T   7752       T 2016 JPYfirms
#8    7752.T   7752       T 2015 JPYfirms
#123    GOOG   GOOG    <NA> 2017 USDfirms
#124    GOOG   GOOG    <NA> 2016 USDfirms
#125    GOOG   GOOG    <NA> 2015 USDfirms
#126    GOOG   GOOG    <NA> 2014 USDfirms
#127    BABA   BABA    <NA> 2018 USDfirms
#128    BABA   BABA    <NA> 2017 USDfirms
#129    BABA   BABA    <NA> 2016 USDfirms
#130    BABA   BABA    <NA> 2015 USDfirms

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM