如何在R中的两个字符之间提取文本

Question

I'd like to extract text between two strings for all occurrences of a pattern. 我想在两个字符串之间为所有出现的模式提取文本。 For example, I have this string: 例如，我有这个字符串：

x<- "\nTYPE:    School\nCITY:   ATLANTA\n\n\nCITY:   LAS VEGAS\n\n"

I'd like to extract the words ATLANTA and LAS VEGAS as such: 我想提取ATLANTA和LAS VEGAS这样的词：

[1] "ATLANTA"   "LAS VEGAS"

I tried using gsub(".*CITY:\\\\s|\\n","",x) . 我尝试使用gsub(".*CITY:\\\\s|\\n","",x) 。 The output this yields is: 产生的输出是：

[1] "  LAS VEGAS"

I would like to output both cities (some patterns in the data include more than 2 cities) and to output them without the leading space. 我想输出两个城市（数据中的一些模式包括超过2个城市）并输出它们而没有前导空格。
I also tried the qdapRegex package but could not get close. 我也尝试过qdapRegex包，但无法接近。 I am not that good with regular expressions so help would be much appreciated. 我对正则表达式不太好，所以非常感谢帮助。

Answer 1

Another option: 另外一个选项：

library(stringr)
str_extract_all(x, "(?<=CITY:\\s{3}).+(?=\\n)")
[[1]]
[1] "ATLANTA"   "LAS VEGAS"

reads as: extract anything preceded by "City: " (and three spaces) and followed by "\\n" 读作：提取任何前面带有“City：”（和三个空格）的内容，然后是“\\ n”

Answer 2

You may use 你可以用

> unlist(regmatches(x, gregexpr("CITY:\\s*\\K.*", x, perl=TRUE)))
[1] "ATLANTA"   "LAS VEGAS"

Here, CITY:\\s*\\K.* regex matches 在这里， CITY:\\s*\\K.*正则表达式匹配

CITY: - a literal substring CITY: CITY: - 一个文字子字符串CITY:
\\s* - 0+ whitespaces \\s* - 0+空格
\\K - match reset operator that discards the text matched so far (zeros the current match memory buffer) \\K - 匹配重置运算符 ，丢弃到目前为止匹配的文本（当前匹配内存缓冲区为零）
.* - any 0+ chars other than line break chars, as many as possible. .* - 除了换行符之外的任何0+字符，尽可能多。

See the regex demo online . 在线查看正则表达式演示 。

Note that since it is a PCRE regex, perl=TRUE is indispensible. 请注意，由于它是PCRE正则表达式，因此perl=TRUE是必不可少的。

Answer 3

An option can be as: 选项可以是：

regmatches(x,gregexpr("(?<=CITY:).*(?=\n\n)",x,perl = TRUE))

# [[1]]
# [1] "   ATLANTA"   "   LAS VEGAS"

如何在R中的两个字符之间提取文本

问题描述

3 个解决方案

解决方案1
3 2018-07-24 20:42:08

解决方案2
2 已采纳 2018-07-24 20:30:21

解决方案3
0 2018-07-24 20:43:59

如何在R中的两个字符之间提取文本

问题描述

3 个解决方案

解决方案1 3 2018-07-24 20:42:08

解决方案2 2 已采纳 2018-07-24 20:30:21

解决方案3 0 2018-07-24 20:43:59

解决方案1
3 2018-07-24 20:42:08

解决方案2
2 已采纳 2018-07-24 20:30:21

解决方案3
0 2018-07-24 20:43:59