I have this data:
USDfirms <- c("GOOG", "BABA" "0071.TW")
TWRfirms <- c("3231.TW")
JPYfirms <- c("7752.T")
I am trying to use the grepl
function to create a new column. So if ticker
in the df
data matches the firm 3231.TW
in one of the 3 above string vectors assign a value ( TWRmatch
) or if ticker
matches the firm GOOG
assign a value USDmatch
etc.
The ticker
values might not always be a perfect fit, ie the ticker
3231 is not an exact match for 3231.TW
which is why I want to use grepl
to ignore the .TW when matching.
df <- structure(list(symbol = c("3231.TW", "3231.TW", "3231.TW", "3231.TW",
"7752.T", "7752.T", "7752.T", "7752.T", "GOOG", "GOOG", "GOOG",
"GOOG", "BABA", "BABA", "BABA", "BABA"), ticker = c("3231", "3231",
"3231", "3231", "7752", "7752", "7752", "7752", "GOOG", "GOOG",
"GOOG", "GOOG", "BABA", "BABA", "BABA", "BABA"), country = c("TW",
"TW", "TW", "TW", "T", "T", "T", "T", NA, NA, NA, NA, NA, NA,
NA, NA), year = c(2017L, 2016L, 2015L, 2014L, 2018L, 2017L, 2016L,
2015L, 2017L, 2016L, 2015L, 2014L, 2018L, 2017L, 2016L, 2015L
)), .Names = c("symbol", "ticker", "country", "year"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 123L, 124L, 125L, 126L, 127L, 128L,
129L, 130L), class = "data.frame")
EDIT:
This function doesn´t seem to work
ifelse(grepl(USDfirms, df$ticker), "yes", "no")
I have also treid:
df$match <- ifelse(USDfirms %in% x$ticker, "yes", "no")
Which just gives me a yes for everything.
Not a perfect solution but a brute force method could be using a nested lapply
/ sapply
solution. Here there is a double loop on every ticker
goes over every element of firm_list
and we check if it is present in any of the element in the list and if it is we extract the name of that list.
df$firms <- unlist(lapply(df$ticker, function(x)
unlist(sapply(seq_along(firm_list), function(y) {
if (any(grepl(x, unlist(firm_list[y]))))
names(firm_list[y])
}))))
df
# symbol ticker country year firms
#1 3231.TW 3231 TW 2017 TWRfirms
#2 3231.TW 3231 TW 2016 TWRfirms
#3 3231.TW 3231 TW 2015 TWRfirms
#4 3231.TW 3231 TW 2014 TWRfirms
#5 7752.T 7752 T 2018 JPYfirms
#6 7752.T 7752 T 2017 JPYfirms
#7 7752.T 7752 T 2016 JPYfirms
#8 7752.T 7752 T 2015 JPYfirms
#123 GOOG GOOG <NA> 2017 USDfirms
#124 GOOG GOOG <NA> 2016 USDfirms
#125 GOOG GOOG <NA> 2015 USDfirms
#126 GOOG GOOG <NA> 2014 USDfirms
#127 BABA BABA <NA> 2018 USDfirms
#128 BABA BABA <NA> 2017 USDfirms
#129 BABA BABA <NA> 2016 USDfirms
#130 BABA BABA <NA> 2015 USDfirms
We move all the firms in a list, so that it is easy to check.
firm_list <- list(USDfirms = c("GOOG", "BABA", "0071.TW"),
TWRfirms = c("3231.TW"),
JPYfirms = c("7752.T"))
Or actually it would be much more convenient and shorter if we create a lookup data frame and then match and extract from it.
ref_df <- data.frame(firms = unlist(firm_list),
names = rep(names(firm_list), lengths(firm_list)))
df$firms <- ref_df$names[sapply(df$ticker, function(x) grep(x, ref_df$firms))]
df
# symbol ticker country year firms
#1 3231.TW 3231 TW 2017 TWRfirms
#2 3231.TW 3231 TW 2016 TWRfirms
#3 3231.TW 3231 TW 2015 TWRfirms
#4 3231.TW 3231 TW 2014 TWRfirms
#5 7752.T 7752 T 2018 JPYfirms
#6 7752.T 7752 T 2017 JPYfirms
#7 7752.T 7752 T 2016 JPYfirms
#8 7752.T 7752 T 2015 JPYfirms
#123 GOOG GOOG <NA> 2017 USDfirms
#124 GOOG GOOG <NA> 2016 USDfirms
#125 GOOG GOOG <NA> 2015 USDfirms
#126 GOOG GOOG <NA> 2014 USDfirms
#127 BABA BABA <NA> 2018 USDfirms
#128 BABA BABA <NA> 2017 USDfirms
#129 BABA BABA <NA> 2016 USDfirms
#130 BABA BABA <NA> 2015 USDfirms
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.