简体   繁体   中英

Extracting elements from a string

Let's say I have the following datasets where the columns are structured as follows.

df1 = data.frame(Date=c(rnorm(5)),  
                 "United States) New York (NY" = c(rnorm(5)), 
                 "United States) Chicago (Illinois" = c(rnorm(5)),
                 "United States) Denver (Colorado" = c(rnorm(5)),
                 "United States) Seattle (Washington" = c(rnorm(5)),
                 "United States) Minneapolis (Minnesota" = c(rnorm(5)), check.names=FALSE)
df1

df2 = data.frame(Date=c(rnorm(5)),
                 "New York (New York, United States)" = c(rnorm(5)),
                 "Phoenix (Arizona, United States)" = c(rnorm(5)),
                 "Chicago (Illinois, United States)" = c(rnorm(5)),
                 "Los Angeles (California, United States)" = c(rnorm(5)), check.names=FALSE)
df2

As you can see, each column is meant to represent a city, but the structure of the column name is not manageable. I was wondering if anyone could help me figure out how to extract the city name from the column name string.

I could have a dictionary of each city and do a string match, but I've had little luck with that. I also assumed there'd be a way to do this with str_split but I haven't figured it out yet.

sapply(str_split(names(df1),")"), 2)

Of course, I'm sure there's a gsub solution also but I'm a little inept when it comes to regular expressions.

Ultimately, I just want the actual city name as a column name.

New York, Chicago, Denver, Seattle, Minneapolis

You can use gsub . Give this a try on the first data frame

gsub(".*[)] (.*) [(].*", "\\1", names(df1)[-1])
# [1] "New York"    "Chicago"     "Denver"      "Seattle"     "Minneapolis"

For the second data frame, a minor adjustment to the first regular expression would work

gsub("(.*) [(].*", "\\1", names(df2)[-1])
# [1] "New York"    "Phoenix"     "Chicago"     "Los Angeles"

Combining these two into one for both sets of names:

nms <- c(names(df1)[-1], names(df2)[-1])
gsub("(.*[)] |)(.*) [(].*", "\\2", nms)
# [1] "New York"    "Chicago"     "Denver"      "Seattle"     "Minneapolis"
# [6] "New York"    "Phoenix"     "Chicago"     "Los Angeles"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM