How do I extract text between two characters in R

Question

I'd like to extract text between two strings for all occurrences of a pattern. For example, I have this string:

x<- "\nTYPE:    School\nCITY:   ATLANTA\n\n\nCITY:   LAS VEGAS\n\n"

I'd like to extract the words ATLANTA and LAS VEGAS as such:

[1] "ATLANTA"   "LAS VEGAS"

I tried using gsub(".*CITY:\\\\s|\\n","",x) . The output this yields is:

[1] "  LAS VEGAS"

I would like to output both cities (some patterns in the data include more than 2 cities) and to output them without the leading space.
I also tried the qdapRegex package but could not get close. I am not that good with regular expressions so help would be much appreciated.

Answer 1

Another option:

library(stringr)
str_extract_all(x, "(?<=CITY:\\s{3}).+(?=\\n)")
[[1]]
[1] "ATLANTA"   "LAS VEGAS"

reads as: extract anything preceded by "City: " (and three spaces) and followed by "\\n"

Answer 2

You may use

> unlist(regmatches(x, gregexpr("CITY:\\s*\\K.*", x, perl=TRUE)))
[1] "ATLANTA"   "LAS VEGAS"

Here, CITY:\\s*\\K.* regex matches

CITY: - a literal substring CITY:
\\s* - 0+ whitespaces
\\K - match reset operator that discards the text matched so far (zeros the current match memory buffer)
.* - any 0+ chars other than line break chars, as many as possible.

See the regex demo online .

Note that since it is a PCRE regex, perl=TRUE is indispensible.

Answer 3

An option can be as:

regmatches(x,gregexpr("(?<=CITY:).*(?=\n\n)",x,perl = TRUE))

# [[1]]
# [1] "   ATLANTA"   "   LAS VEGAS"

How do I extract text between two characters in R

Question

3 answers

solution1
3 2018-07-24 20:42:08

solution2
2 ACCPTED 2018-07-24 20:30:21

solution3
0 2018-07-24 20:43:59

How do I extract text between two characters in R

Question

3 answers

solution1 3 2018-07-24 20:42:08

solution2 2 ACCPTED 2018-07-24 20:30:21

solution3 0 2018-07-24 20:43:59

solution1
3 2018-07-24 20:42:08

solution2
2 ACCPTED 2018-07-24 20:30:21

solution3
0 2018-07-24 20:43:59