正则表达式 - 在某些字符之后匹配文本

Question

I want to scrape data from some text and dump it into an array.我想从一些文本中抓取数据并将其转储到一个数组中。 Consider the following text as example data:考虑以下文本作为示例数据：

| Example Data
| Title: This is a sample title
| Content: This is sample content
| Date: 12/21/2012

I am currently using the following regex to scrape the data that is specified after the 'colon' character:我目前正在使用以下正则表达式来抓取在“冒号”字符后指定的数据：

/((?=:).+)/

Unfortunately this regex also grabs the colon and the space after the colon.不幸的是，这个正则表达式也会占用冒号和冒号后面的空格。 How do I only grab the data?我如何只抓取数据？

Also, I'm not sure if I'm doing this right.. but it appears as though the outside parens causes a match to return an array.另外，我不确定我这样做是否正确......但看起来好像外部括号导致匹配返回一个数组。 Is this the function of the parens?这是parens的功能吗？

EDIT: I'm using Rubular to test out my regex expressions编辑：我使用 Rubular 来测试我的正则表达式

Answer 1

You could change it to:您可以将其更改为：

/: (.+)/

and grab the contents of group 1. A lookbehind works too, though, and does just what you're asking:并获取第 1 组的内容。不过，lookbehind 也可以工作，并且可以满足您的要求：

/(?<=: ).+/

Answer 2

In addition to @minitech's answer, you can also make a 3rd variation:除了@minitech 的回答，您还可以进行第三个变体：

/(?<=: ?)(.+)/

The difference here being, you create/grab the group using a look-behind.这里的区别在于，您使用后视创建/获取组。

If you still prefer the look-ahead rather than look-behind concept.如果您仍然更喜欢前瞻而不是后视概念。 . . . .

/(?=: ?(.+))/

This will place a grouping around your existing regex where it will catch it within a group.这将围绕您现有的正则表达式放置一个分组，它将在一个组中捕获它。

And yes, the outside parenthesis in your code will make a match.是的，代码中的外括号将匹配。 Compare that to the latter example I gave where the entire look-ahead is 'grouped' rather than needlessly using a /( ... )/ without the /(?= ... )/ , since the first result in most regular expression engines return the entire matched string.与我给出的后一个例子相比，整个前瞻被“分组”而不是不必要地使用/( ... )/没有/(?= ... )/ ，因为第一个结果是大多数正则表达式引擎返回整个匹配的字符串。

Answer 3

I know you are asking for regex but I just saw the regex solution and found that it is rather hard to read for those unfamiliar with regex.我知道您要求使用正则表达式，但我刚刚看到正则表达式解决方案，发现对于不熟悉正则表达式的人来说，阅读起来相当困难。

I'm also using Ruby and I decided to do it with:我也在使用 Ruby，我决定这样做：

line_as_string.split(": ")[-1]

This does what you require and IMHO it's far more readable.这可以满足您的要求，恕我直言，它更具可读性。 For a very long string it might be inefficient.对于很长的字符串，它可能效率低下。 But not for this purpose.但不是为了这个目的。

Answer 4

In Ruby, as in PCRE and Boost, you may make use of the \\K match reset operator :在 Ruby 中，就像在 PCRE 和 Boost 中一样，您可以使用\\K匹配重置运算符：

\\K keeps the text matched so far out of the overall regex match. \\K保持文本匹配到目前为止在整体正则表达式匹配之外。 h\\Kd matches only the second d in adhd . h\\Kd仅匹配adhd的第二个d 。

So, you may use所以，你可以使用

/:[[:blank:]]*\K.+/     # To only match horizontal whitespaces with `[[:blank:]]`
/:\s*\K.+/              # To match any whitespace with `\s`

Seee the Rubular demo #1 and the Rubular demo #2 and请参阅Rubular 演示 #1和Rubular 演示 #2以及

Details细节

: - a colon : - 一个冒号
[[:blank:]]* - 0 or more horizontal whitespace chars [[:blank:]]* - 0 个或多个水平空白字符
\\K - match reset operator discarding the text matched so far from the overall match memory buffer \\K - 匹配重置操作符从整体匹配内存缓冲区丢弃到目前为止匹配的文本
.+ - matches and consumes any 1 or more chars other than line break chars (use /m modifier to match any chars including line break chars). .+ - 匹配并使用除换行符以外的任何 1 个或多个字符（使用/m修饰符匹配包括换行符在内的任何字符）。

正则表达式 - 在某些字符之后匹配文本

问题描述

4 个解决方案

解决方案1
18 已采纳 2012-12-17 23:58:23

解决方案2
4 2012-12-18 01:45:17

解决方案3
1 2014-08-28 12:46:16

解决方案4
0 2020-07-10 16:08:06

正则表达式 - 在某些字符之后匹配文本

问题描述

4 个解决方案

解决方案1 18 已采纳 2012-12-17 23:58:23

解决方案2 4 2012-12-18 01:45:17

解决方案3 1 2014-08-28 12:46:16

解决方案4 0 2020-07-10 16:08:06

解决方案1
18 已采纳 2012-12-17 23:58:23

解决方案2
4 2012-12-18 01:45:17

解决方案3
1 2014-08-28 12:46:16

解决方案4
0 2020-07-10 16:08:06