如何在正则表达式中表达“重复此部分”？

Question

Suppose I want to match a string like this: 假设我想匹配一个这样的字符串：

123(432)123(342)2348(34) 123（432）123（342）2348（34）

I can match digits like 123 with [\\d]* and (432) with \$[\\d]+\$ . 我可以将123与[\\d]*和(432)与\$[\\d]+\$匹配。

How can match the whole string by repeating either of the 2 patterns? 如何通过重复两种模式中的任何一种来匹配整个字符串？

I tried [[\\d]* | \$[\\d]+\$]+ 我试过[[\\d]* | \$[\\d]+\$]+ [[\\d]* | \$[\\d]+\$]+ , but this is incorrect. [[\\d]* | \$[\\d]+\$]+ ，但这是不正确的。

I am using python re module. 我正在使用python re模块。

Answer 1

I think you need this regex: 我认为你需要这个正则表达式：

"^(\d+|\(\d+\))+$"

and to avoid catastrophic backtracking you need to change it to a regex like this: 并且为了避免灾难性的回溯，你需要将它改为像这样的正则表达式：

"^(\d|\(\d+\))+$"

Answer 2

You can use a character class to match the whole of string : 您可以使用字符类来匹配整个字符串：

[\d()]+

But if you want to match the separate parts in separate groups you can use re.findall with a spacial regex based on your need, for example : 但是，如果要匹配单独组中的单独部分，可以根据需要将re.findall与空间正则表达式匹配，例如：

>>> import re
>>> s="123(432)123(342)2348(34)"
>>> re.findall(r'\d+\(\d+\)',s)
['123(432)', '123(342)', '2348(34)']
>>>

Or : 要么：

>>> re.findall(r'(\d+)\((\d+)\)',s)
[('123', '432'), ('123', '342'), ('2348', '34')]

Or you can just use \\d+ to get all the numbers : 或者您可以使用\\d+来获取所有数字：

>>> re.findall(r'\d+',s)
['123', '432', '123', '342', '2348', '34']

If you want to match the patter \\d+\$\\d+\$ repeatedly you can use following regex : 如果要重复匹配patter \\d+\$\\d+\$ ，可以使用以下正则表达式：

(?:\d+\(\d+\))+

Answer 3

You can achieve it with this pattern: 你可以用这种模式实现它：

^(?=.)\d*(?:\(\d+\)\d*)*$

demo 演示

(?=.) ensures there is at least one character (if you want to allow empty strings, remove it). (?=.)确保至少有一个字符（如果要允许空字符串，请将其删除）。

\\d*(?:\$\\d+\$\\d*)* is an unrolled sub-pattern. \\d*(?:\$\\d+\$\\d*)*是展开的子图案。 Explanation: With a bactracking regex engine, when you have a sub-pattern like (A|B)* where A and B are mutually exclusive (or at least when the end of A or B doesn't match respectively the beginning of B or A), you can rewrite the sub-pattern like this: A*(BA*)* or B*(AB*)* . 说明：使用bactracking正则表达式引擎，当你有一个子模式，如(A|B)* ，其中A和B互斥（或至少当A或B的结尾分别不匹配B的开头或A），您可以像这样重写子模式： A*(BA*)*或B*(AB*)* 。 For your example, it replaces (?:\\d+|\$\\d+\$)* This new form is more efficient: it reduces the steps needed to obtain a match, it avoids a great part of the eventual bactracking. 对于你的例子，它取代了(?:\\d+|\$\\d+\$)*这个新形式更有效：它减少了获得匹配所需的步骤，它避免了最终的bactracking的很大一部分。

Note that you can improve it more, if you emulate an atomic group (?>....) with this trick (?=(....))\\1 that uses the fact that a lookahead is naturally atomic: 请注意，如果使用此技巧 (?=(....))\\1来模拟原子组 (?>....) ，使用前瞻自然是原子的事实，则可以进一步改进它：

^(?=.)(?=(\d*(?:\(\d+\)\d*)*))\1$

demo (compare the number of steps needed with the previous version and check the debugger to see what happens) 演示 （比较先前版本所需的步骤数，并检查调试器以查看发生的情况）

Note: if you don't want two consecutive numbers enclosed in parenthesis, you only need to change the quantifier * with + inside the non-capturing group and to add (?:\$\\d+\$)? 注意：如果你不想在括号中包含两个连续的数字，你只需要在非捕获组中用+更改量词*并添加(?:\$\\d+\$)? at the end of the pattern, before the anchor $ : 在模式结束时，在锚点$之前：

^(?=.)\d*(?:\(\d+\)\d+)*(?:\(\d+\))?$

or 要么

^(?=.)(?=(\d*(?:\(\d+\)\d+)*(?:\(\d+\))?))\1$

如何在正则表达式中表达“重复此部分”？

问题描述

3 个解决方案

解决方案1
3 已采纳 2015-09-01 10:03:23

解决方案2
2 2015-09-01 10:02:50

解决方案3
1 2015-09-01 11:05:45

如何在正则表达式中表达“重复此部分”？

问题描述

3 个解决方案

解决方案1 3 已采纳 2015-09-01 10:03:23

解决方案2 2 2015-09-01 10:02:50

解决方案3 1 2015-09-01 11:05:45

解决方案1
3 已采纳 2015-09-01 10:03:23

解决方案2
2 2015-09-01 10:02:50

解决方案3
1 2015-09-01 11:05:45