简体   繁体   中英

Case-insensitive search of a list in R

Can I search a character list for a string where I don't know how the string is cased? Or more generally, I'm trying to reference a column in a dataframe, but I don't know exactly how the columns are cased. My thought was to search names(myDataFrame) in a case-insensitive manner to return the proper casing of the column.

I would suggest the grep() function and some of its additional arguments that make it a pleasure to use.

grep("stringofinterest",names(dataframeofinterest),ignore.case=TRUE,value=TRUE)

without the argument value=TRUE you will only get a vector of index positions where the match occurred.

Assuming that there are no variable names which differ only in case, you can search your all-lowercase variable name in tolower(names(myDataFrame)) :

match("b", tolower(c("A","B","C")))
[1] 2

This will produce only exact matches, but that is probably desirable in this case.

With the stringr package, you can modify the pattern with one of the built in modifier functions (see `?modifiers). For example since we are matching a fixed string (no special regular expression characters) but want to ignore case, we can do

str_detect(colnames(iris), fixed("species", ignore_case=TRUE))

Or you can use the (?i) case insensitive modifier

str_detect(colnames(iris), "(?i)species")

The searchable package was created for allowing for various types of searching within objects:

l <- list( a=1, b=2, c=3 )
sl <- searchable(l)        # make the list "searchable"
sl <- ignore.case(sl)      # turn on case insensitivity

> sl['B']
$b
[1] 2

It works with lists and vectors and does a lot more than simple case-insensitive matching.

For anyone using this with %in% , simply use tolower on the right (or both) sides, like so:

"b" %in% c("a", "B", "c")
# [1] FALSE

tolower("b") %in% tolower(c("a", "B", "c"))
# [1] TRUE

If you want to search for one set of strings in another set of strings, case insensitively, you could try:

s1 = c("a", "b")
s2 = c("B", "C")
matches = s1[ toupper(s1) %in% toupper(s2) ]

Another way of achieving this is to use str_which(string, pattern) from the stringr package:

library("stringr")
str_which(string = tolower(colnames(iris)), pattern = "species")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM