简体   繁体   English

正则表达式 - 在某些字符之后匹配文本

[英]Regex - Matching text AFTER certain characters

I want to scrape data from some text and dump it into an array.我想从一些文本中抓取数据并将其转储到一个数组中。 Consider the following text as example data:考虑以下文本作为示例数据:

| Example Data
| Title: This is a sample title
| Content: This is sample content
| Date: 12/21/2012

I am currently using the following regex to scrape the data that is specified after the 'colon' character:我目前正在使用以下正则表达式来抓取在“冒号”字符后指定的数据:

/((?=:).+)/

Unfortunately this regex also grabs the colon and the space after the colon.不幸的是,这个正则表达式也会占用冒号和冒号后面的空格。 How do I only grab the data?我如何只抓取数据?

Also, I'm not sure if I'm doing this right.. but it appears as though the outside parens causes a match to return an array.另外,我不确定我这样做是否正确......但看起来好像外部括号导致匹配返回一个数组。 Is this the function of the parens?这是parens的功能吗?

EDIT: I'm using Rubular to test out my regex expressions编辑:我使用 Rubular 来测试我的正则表达式

You could change it to:您可以将其更改为:

/: (.+)/

and grab the contents of group 1. A lookbehind works too, though, and does just what you're asking:并获取第 1 组的内容。不过,lookbehind 也可以工作,并且可以满足您的要求:

/(?<=: ).+/

In addition to @minitech's answer, you can also make a 3rd variation:除了@minitech 的回答,您还可以进行第三个变体:

/(?<=: ?)(.+)/

The difference here being, you create/grab the group using a look-behind.这里的区别在于,您使用后视创建/获取组。

If you still prefer the look-ahead rather than look-behind concept.如果您仍然更喜欢前瞻而不是后视概念。 . . . .

/(?=: ?(.+))/

This will place a grouping around your existing regex where it will catch it within a group.这将围绕您现有的正则表达式放置一个分组,它将在一个组中捕获它。

And yes, the outside parenthesis in your code will make a match.是的,代码中的外括号匹配。 Compare that to the latter example I gave where the entire look-ahead is 'grouped' rather than needlessly using a /( ... )/ without the /(?= ... )/ , since the first result in most regular expression engines return the entire matched string.与我给出的后一个例子相比,整个前瞻被“分组”而不是不必要地使用/( ... )/没有/(?= ... )/ ,因为第一个结果是大多数正则表达式引擎返回整个匹配的字符串。

I know you are asking for regex but I just saw the regex solution and found that it is rather hard to read for those unfamiliar with regex.我知道您要求使用正则表达式,但我刚刚看到正则表达式解决方案,发现对于不熟悉正则表达式的人来说,阅读起来相当困难。

I'm also using Ruby and I decided to do it with:我也在使用 Ruby,我决定这样做:

line_as_string.split(": ")[-1]

This does what you require and IMHO it's far more readable.这可以满足您的要求,恕我直言,它更具可读性。 For a very long string it might be inefficient.对于很长的字符串,它可能效率低下。 But not for this purpose.但不是为了这个目的。

In Ruby, as in PCRE and Boost, you may make use of the \\K match reset operator :在 Ruby 中,就像在 PCRE 和 Boost 中一样,您可以使用\\K匹配重置运算符

\\K keeps the text matched so far out of the overall regex match. \\K保持文本匹配到目前为止在整体正则表达式匹配之外。 h\\Kd matches only the second d in adhd . h\\Kd仅匹配adhd的第二个d

So, you may use所以,你可以使用

/:[[:blank:]]*\K.+/     # To only match horizontal whitespaces with `[[:blank:]]`
/:\s*\K.+/              # To match any whitespace with `\s`

Seee the Rubular demo #1 and the Rubular demo #2 and请参阅Rubular 演示 #1Rubular 演示 #2以及

Details细节

  • : - a colon : - 一个冒号
  • [[:blank:]]* - 0 or more horizontal whitespace chars [[:blank:]]* - 0 个或多个水平空白字符
  • \\K - match reset operator discarding the text matched so far from the overall match memory buffer \\K - 匹配重置操作符从整体匹配内存缓冲区丢弃到目前为止匹配的文本
  • .+ - matches and consumes any 1 or more chars other than line break chars (use /m modifier to match any chars including line break chars). .+ - 匹配并使用除换行符以外的任何 1 个或多个字符(使用/m修饰符匹配包括换行符在内的任何字符)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM