I've seen many iterations of extracting w/ gsub
but they mostly deal with extracting from left to right or after one occurrence. I am wanting to match from right to left, counting four occurrences of -
, matching everything between the 3rd and 4th occurrence.
For example:
string outcome
here-are-some-words-to-try some
a-b-c-d-e-f-g-h-i f
Here are a few references I've tried using:
x = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")
sapply(x, function(strings){
ind = unlist(gregexpr(pattern = "-", text = strings))
if (length(ind) < 4){NA}
else{substr(strings, ind[length(ind) - 3] + 1, ind[length(ind) - 2] - 1)}
})
#here-are-some-words-to-try a-b-c-d-e-f-g-h-i
# "some" "f"
You could use
([^-]+)(?:-[^-]+){3}$
See a demo on regex101.com .
R
this could be
library(dplyr) library(stringr) df <- data.frame(string = c('here-are-some-words-to-try', 'abcdefgh-i', ' no dash in here'), stringsAsFactors = FALSE) df <- df %>% mutate(outcome = str_match(string, '([^-]+)(?:-[^-]+){3}$')[,2]) df
And yields
string outcome 1 here-are-some-words-to-try some 2 abcdefghi f 3 no dash in here <NA>
How about splitting your sentence ? Something like
string <- "here-are-some-words-to-try"
# separate all words
val <- strsplit(string, "-")[[1]]
# reverse the order
val rev(val)
# take the 4th element
val[4]
# And using a dataframe
library(tidyverse)
tibble(string = c("here-are-some-words-to-try", "a-b-c-d-e-f-g-h-i")) %>%
mutate(outcome = map_chr(string, function(s) rev(strsplit(s, "-")[[1]])[4]))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.