简体   繁体   English

R - 在两个字符串之间提取字符串

[英]R - Extract String between two strings

I want to take a string variable that has a lot of text in it, search until it finds a match "UpperBoundery" and then searches until it sees text after that upper boundary until it finds another match "LowerBoundery" then return to me the text that is between those two boundaries.我想获取一个包含大量文本的字符串变量,搜索直到找到匹配的“UpperBoundery”,然后搜索直到在该上边界之后看到文本,直到找到另一个匹配“LowerBoundery”,然后将文本返回给我那是在这两个边界之间。

For example, the upper boundary would be ""Country":"" and the ending boundary would be "",".例如,上边界是 ""Country":"",结束边界是 "","。

This is a snip of what the text I'm dealing with looks like:这是我正在处理的文本的一个片段:

> }],"Country":"United States",
> }],"Country":"China",

So I want the results to come back:所以我希望结果回来:

> United States
> China

What code or function can people share with me to do this?人们可以与我分享什么代码或 function 来执行此操作? I've been looking forever and tried numerious things (stri, grep, find, etc.) but I can't get anything to do what I'm looking for.我一直在寻找并尝试了很多东西(stri、grep、find 等),但我找不到任何东西可以做我正在寻找的东西。 Thank you for your help谢谢您的帮助

Here's a regex method, though as I mentioned in comments I'd strongly recommend using, eg, the jsonlite package instead.这是一个正则表达式方法,尽管正如我在评论中提到的,我强烈建议使用jsonlite package 代替。

# input:
x = c('> }],"Country":"United States",', 
'> }],"Country":"China",')

library(stringr)
result = str_extract(x, pattern = '(?<=Country":")[^,]+(?=",)')
result
# [1] "United States" "China" 

Explanation:解释:

  • (?<=...) is the look-behind pattern. (?<=...)是后视模式。 So we're looking behind (before) the match for Country":" .所以我们正在寻找Country":"的比赛之后(之前)。
  • [^"]+ is our main pattern - ^ in brackets is "not", so we're looking for any character that is not a " . [^"]+是我们的主要模式 - 括号中的^是 "not",所以我们正在寻找任何不是"的字符。 And + is the quantifier, so one or more non- " characters.+是量词,所以一个或多个非- "字符。
  • (?=...) is the look-ahead pattern. (?=...)是前瞻模式。 So we're looking after the match for ", "所以我们正在处理", ”的匹配

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM