[英]How to match the following regex python?
How to match the following with regex? 如何将以下与正则表达式匹配?
string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'
I am trying the following: 我正在尝试以下方法:
groupsofmatches = re.match('(?P<booknumber>.*)\)([ \t]+)?(?P<item>.*)(\(.*\))?\(.*?((\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)
The issue is when I apply it to string2 it works fine, but when I apply the expression to string1, I am unable to get the "m.group(name)" because of the "(TUD)" part. 问题是,当我将其应用于string2时,它可以正常工作,但是当我将表达式应用于string1时,由于存在“(TUD)”部分,因此无法获得“ m.group(name)”。 I want to use a single expression that works for both strings. 我想使用适用于两个字符串的单个表达式。
I expect: 我预计:
booknumber = 1.0
item = The Ugly Duckling (TUD)
You could impose some heavier restrictions on your repeated characters: 您可以对重复的字符施加一些更大的限制:
groupsofmatches = re.match('([^)]*)\)[ \t]*(?P<item>.*)\([^)]*?(?P<dollaramount>(?:\d+)?(?:\.\d+)?)[^)]*\)$', string1)
This will make sure that the numbers are taken from the last set of parentheses. 这样可以确保数字取自最后一组括号。
我将其写为:
num, name, value = re.match(r'(.+?)\) (.*?) \(([\d.]+) Dollars\)', s2).groups()
Your problem is that .*
matches greedily, and it may be consuming too much of the string. 您的问题是.*
贪婪地匹配,这可能会占用过多的字符串。 Printing all of the match groups will make this more obvious: 打印所有匹配组将使这一点更加明显:
import re
string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'
result = re.match(r'(.*?)\)([ \t]+)?(?P<item>.*)\(.*?(?P<dollaramount>(\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)
print repr(result.groups())
print result.group('item')
print result.group('dollaramount')
Changing them to *?
将它们更改为*?
makes the match the minimum . 使匹配最小 。
This can be expensive in some RE engines, so you can also write eg \\([^)]*\\)
to match all the parenthesis. 在某些RE引擎中,这可能会很昂贵,因此您也可以编写\\([^)]*\\)
来匹配所有括号。 If you're not processing a lot of text it probably doesn't matter. 如果您不处理大量文本,则可能没有关系。
btw, you should really use raw strings (ie r'something'
) for regexps, to avoid surprising backslash behaviour, and to give the reader a clue. 顺便说一句,您确实应该对正则r'something'
使用原始字符串(即r'something'
),以避免令人惊讶的反斜杠行为,并为读者提供线索。
I see you had this group (\\(.*?\\))?
我看到您有这个群组(\\(.*?\\))?
which presumably was cutting out the (TUD)
, but if you actually want that in the title, just remove it. 大概是在(TUD)
,但是如果您确实想要标题中的内容,则将其删除。
这就是我如何通过演示来做到这一点
(?P<booknumber>\\d+(?:\\.\\d+)?)\\)\\s+(?P<item>.*?)\\s+\\(\\d+(?:\\.\\d+)?\\s+Dollars\\)
我建议你使用正则表达式模式
(?P<booknumber>[^)]*)\)\s+(?P<item>.*\S)\s+\((?!.*\()(?P<amount>\S+)\s+Dollars?\)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.