Perl中的正则表达式帮助

Question

I have following text pattern 我有以下文字模式

(2222) First Last (ab-cd/ABC1), <first.last@site.domain.com> 1224: efadsfadsfdsf

(3333) First Last (abcd/ABC12), <first.last@site.domain.com> 1234, 4657: efadsfadsfdsf

I want the number 1224 or 1234, 4657 from the above text after the text > . 我希望文本>之后的上述文本中的数字1224或1234, 4657 。

I have this \\((\\d+)\\)\\s\\w*\\s\\w*\\s\\(\\w*\\/\\w+\\d*\\),\\s<\\w*\\.\\w*\\@\\w*\\.domain.com>\\s\\d+: which will take the text before : But i want the one after email till : 我有这个\\((\\d+)\\)\\s\\w*\\s\\w*\\s\\(\\w*\\/\\w+\\d*\\),\\s<\\w*\\.\\w*\\@\\w*\\.domain.com>\\s\\d+:将采用之前的文本:但我想要一封电子邮件，直到:

Is there any easy regular expression to do this? 有没有简单的正则表达式来做到这一点？ or should I use split and do this 或者我应该使用split并执行此操作

Thanks 谢谢

Edit: The whole text is returned by a command line tool. 编辑：整个文本由命令行工具返回。

(3333) First Last (abcd/ABC12), <first.last@site.domain.com> 1234, 4657: efadsfadsfdsf

(3333) - Unique ID (3333) - 唯一ID

First Last - First and last names First Last - 名字和姓氏

<first.last@site.domain.com> - Email address in format FirstName.LastName@sub.domain.com <first.last@site.domain.com> - 电子邮件地址，格式为FirstName.LastName@sub.domain.com

1234, 4567 - database primary Keys 1234, 4567 - 数据库主键

: xxxx - Headline : xxxx - 标题

What I have to do is process the above and get hte database ID (in ex: 1234, 4567 2 separate ID's) and query the tables 我要做的是处理上面的内容并获取数据库ID（在ex：1234,4567 2中单独的ID）并查询表

The above is the output (like this I will get many entries) from the tool which I am calling via my Perl script. 以上是我通过Perl脚本调用的工具的输出（我会得到很多条目）。

My idea was to use a regular expression to get the database id's. 我的想法是使用正则表达式来获取数据库ID。 Guess I could use regular expression for this 猜猜我可以使用正则表达式

Answer 1

you can fudge the stuff you don't care about to make the expression easier, say just 'glob' the parts between the parentheticals (and the email delimiters) using non-greedy quantifiers: 你可以捏造你不关心的东西来使表达变得更容易，比如使用非贪婪量词的'glob'来表示括号（和电子邮件分隔符）之间的部分：

/(\d+)\).*?\(.*?\),\s*<.*?>\s*(\d+(?:,\s*\d+)*):/   (not tested!)

there's only two captured groups, the (1234), and the (1234, 4657), the second one which I can only assume from your pattern to mean: "a digit string, followed by zero or more comma separated digit strings". 只有两个被捕获的组，（1234）和（1234,4657），第二个我只能从你的模式中假设：“一个数字字符串，后跟零个或多个逗号分隔的数字字符串”。

Answer 2

Well, a simple fix is to just allow all the possible characters in a character class. 好吧，一个简单的解决方法是只允许字符类中的所有可能字符。 Which is to say change \\d to [\\d, ] to allow digits, commas and space. 也就是说改变\\d到[\\d, ]允许数字，逗号和空格。

Your regex as it is, though, does not match the first sample line, because it has a dash - in it ( ab-cd/ABC1 does not match \\w*\\/\\w+\\d*\\ ). 您正则表达式，因为它是，虽然不符合第一个样本行，因为它有一个破折号-在它（ ab-cd/ABC1不匹配\\w*\\/\\w+\\d*\\ ）。 Also, it is not a good idea to rely too heavily on the * quantifier, because it does match the empty string (it matches zero or more times), and should only be used for things which are truly optional. 此外，过分依赖*量词并不是一个好主意，因为它确实匹配空字符串（它匹配零次或多次），并且只应用于真正可选的事物。 Use + otherwise, which matches (1 or more times). 使用+否则匹配（1次或多次）。

You have a rather strict regex, and with slight variations in your data like this, it will fail. 你有一个相当严格的正则表达式，并且像这样的数据略有变化，它将失败。 Only you know what your data looks like, and if you actually do need a strict regex. 只有你知道你的数据是什么样的，如果你确实需要一个严格的正则表达式。 However, if your data is somewhat consistent, you can use a loose regex simply based on the email part: 但是，如果您的数据有些一致，则可以根据电子邮件部分使用松散的正则表达式：

sub extract_nums {
    my $string = shift;
    if ($string =~ /<[^>]*> *([\d, ]+):/) {
        return $1 =~ /\d+/g;   # return the extracted digits in a list
        # return $1;           # just return the string as-is
    } else { return undef }
}

This assumes, of course, that you cannot have <> tags in front of the email part of the line. 当然，这假定您不能在该行的电子邮件部分前面添加<>标签。 It will capture any digits, commas and spaces found between a <> tag and a colon, and then return a list of any digits found in the match. 它将捕获在<>标记和冒号之间找到的任何数字，逗号和空格，然后返回匹配中找到的任何数字的列表。 You can also just return the string, as shown in the commented line. 您也可以只返回字符串，如注释行所示。

Answer 3

There would appear to be something missing from your examples. 您的示例中似乎缺少某些内容。 Is this what they're supposed to look like, with email? 这是他们应该看起来像电子邮件的样子吗？

(1234) First Last (ab-cd/ABC1), <foo.bar@domain.com> 1224: efadsfadsfdsf

(1234) First Last (abcd/ABC12), <foo.bar@domain.com> 1234, 4657: efadsfadsfdsf

If so, this should work: 如果是这样，这应该工作：

\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\@\w*\.domain\.com>\s\d+(?:,\s(\d+))?:

Answer 4

$string =~ /.*>\s*(.+):.+/;
$numbers = $1;

That's it. 而已。 Tested. 测试。

With number catching: 数字捕捉：

$string =~ /.*>\s*(?([0-9]|,)+):.+/;
$numbers = $1;

Not tested but you get the idea. 没有经过测试，但你明白了。

Perl中的正则表达式帮助

问题描述

4 个解决方案

解决方案1
1 已采纳 2012-02-13 16:57:31

解决方案2
1 2012-02-13 17:58:19

解决方案3
0 2012-02-13 16:50:15

解决方案4
0 2012-02-13 17:18:17

Perl中的正则表达式帮助

问题描述

4 个解决方案

解决方案1 1 已采纳 2012-02-13 16:57:31

解决方案2 1 2012-02-13 17:58:19

解决方案3 0 2012-02-13 16:50:15

解决方案4 0 2012-02-13 17:18:17

解决方案1
1 已采纳 2012-02-13 16:57:31

解决方案2
1 2012-02-13 17:58:19

解决方案3
0 2012-02-13 16:50:15

解决方案4
0 2012-02-13 17:18:17