简体   繁体   中英

R - Extract info after nth occurrence of a character from the right of string

I've seen many iterations of extracting w/ gsub but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of - , matching everything between the 3rd and 4th occurrence.

For example:

string                       outcome
here-are-some-words-to-try   some
a-b-c-d-e-f-g-h-i            f

Here are a few references I've tried using:

x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
    ind = unlist(gregexpr(pattern = "-", text = strings))
    if (length(ind) < 4){NA}
    else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try          a-b-c-d-e-f-g-h-i 
#                    "some"                        "f" 

You could use

([^-]+)(?:-[^-]+){3}$

See a demo on regex101.com .


In R this could be

 library(dplyr) library(stringr) df <- data.frame(string = c('here-are-some-words-to-try', 'abcdefgh-i', ' no dash in here'), stringsAsFactors = FALSE) df <- df %>% mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2]) df 

And yields

  string outcome 1 here-are-some-words-to-try some 2 abcdefghi f 3 no dash in here <NA> 

How about splitting your sentence ? Something like

string <- "here-are-some-words-to-try"

# separate all words
val <- strsplit(string, "-")[[1]]

# reverse the order
val rev(val)

# take the 4th element
val[4]

# And using a dataframe
library(tidyverse)
tibble(string = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")) %>% 
mutate(outcome = map_chr(string, function(s) rev(strsplit(s, "-")[[1]])[4]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM