如何匹配以下正则表达式python？

Question

How to match the following with regex? 如何将以下与正则表达式匹配？

string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'

I am trying the following: 我正在尝试以下方法：

groupsofmatches = re.match('(?P<booknumber>.*)\)([ \t]+)?(?P<item>.*)(\(.*\))?\(.*?((\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)

The issue is when I apply it to string2 it works fine, but when I apply the expression to string1, I am unable to get the "m.group(name)" because of the "(TUD)" part. 问题是，当我将其应用于string2时，它可以正常工作，但是当我将表达式应用于string1时，由于存在“（TUD）”部分，因此无法获得“ m.group（name）”。 I want to use a single expression that works for both strings. 我想使用适用于两个字符串的单个表达式。

I expect: 我预计：

booknumber = 1.0
item = The Ugly Duckling (TUD)

Answer 1

You could impose some heavier restrictions on your repeated characters: 您可以对重复的字符施加一些更大的限制：

groupsofmatches = re.match('([^)]*)\)[ \t]*(?P<item>.*)\([^)]*?(?P<dollaramount>(?:\d+)?(?:\.\d+)?)[^)]*\)$', string1)

This will make sure that the numbers are taken from the last set of parentheses. 这样可以确保数字取自最后一组括号。

Answer 2

我将其写为：

num, name, value = re.match(r'(.+?)\) (.*?) \(([\d.]+) Dollars\)', s2).groups()

Answer 3

Your problem is that .* matches greedily, and it may be consuming too much of the string. 您的问题是.*贪婪地匹配，这可能会占用过多的字符串。 Printing all of the match groups will make this more obvious: 打印所有匹配组将使这一点更加明显：

import re

string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'

result = re.match(r'(.*?)\)([ \t]+)?(?P<item>.*)\(.*?(?P<dollaramount>(\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)

print repr(result.groups())
print result.group('item')
print result.group('dollaramount')

Changing them to *? 将它们更改为*? makes the match the minimum . 使匹配最小。

This can be expensive in some RE engines, so you can also write eg \\([^)]*\\) to match all the parenthesis. 在某些RE引擎中，这可能会很昂贵，因此您也可以编写\\([^)]*\\)来匹配所有括号。 If you're not processing a lot of text it probably doesn't matter. 如果您不处理大量文本，则可能没有关系。

btw, you should really use raw strings (ie r'something' ) for regexps, to avoid surprising backslash behaviour, and to give the reader a clue. 顺便说一句，您确实应该对正则r'something'使用原始字符串（即r'something' ），以避免令人惊讶的反斜杠行为，并为读者提供线索。

I see you had this group (\\(.*?\\))? 我看到您有这个群组(\\(.*?\\))? which presumably was cutting out the (TUD) , but if you actually want that in the title, just remove it. 大概是在(TUD) ，但是如果您确实想要标题中的内容，则将其删除。

Answer 4

这就是我如何通过演示来做到这一点

(?P<booknumber>\\d+(?:\\.\\d+)?)\\)\\s+(?P<item>.*?)\\s+\\(\\d+(?:\\.\\d+)?\\s+Dollars\\)

Answer 5

我建议你使用正则表达式模式

(?P<booknumber>[^)]*)\)\s+(?P<item>.*\S)\s+\((?!.*\()(?P<amount>\S+)\s+Dollars?\)

如何匹配以下正则表达式python？

问题描述

5 个解决方案

解决方案1
0 2012-10-29 23:26:42

解决方案2
0 2012-10-29 23:28:00

解决方案3
0 已采纳 2012-10-29 23:28:00

解决方案4
0 2012-10-29 23:29:31

解决方案5
0 2012-10-29 23:49:24

如何匹配以下正则表达式python？

问题描述

5 个解决方案

解决方案1 0 2012-10-29 23:26:42

解决方案2 0 2012-10-29 23:28:00

解决方案3 0 已采纳 2012-10-29 23:28:00

解决方案4 0 2012-10-29 23:29:31

解决方案5 0 2012-10-29 23:49:24

解决方案1
0 2012-10-29 23:26:42

解决方案2
0 2012-10-29 23:28:00

解决方案3
0 已采纳 2012-10-29 23:28:00

解决方案4
0 2012-10-29 23:29:31

解决方案5
0 2012-10-29 23:49:24