How to match the following with regex?
string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'
I am trying the following:
groupsofmatches = re.match('(?P<booknumber>.*)\)([ \t]+)?(?P<item>.*)(\(.*\))?\(.*?((\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)
The issue is when I apply it to string2 it works fine, but when I apply the expression to string1, I am unable to get the "m.group(name)" because of the "(TUD)" part. I want to use a single expression that works for both strings.
I expect:
booknumber = 1.0
item = The Ugly Duckling (TUD)
You could impose some heavier restrictions on your repeated characters:
groupsofmatches = re.match('([^)]*)\)[ \t]*(?P<item>.*)\([^)]*?(?P<dollaramount>(?:\d+)?(?:\.\d+)?)[^)]*\)$', string1)
This will make sure that the numbers are taken from the last set of parentheses.
我将其写为:
num, name, value = re.match(r'(.+?)\) (.*?) \(([\d.]+) Dollars\)', s2).groups()
Your problem is that .*
matches greedily, and it may be consuming too much of the string. Printing all of the match groups will make this more obvious:
import re
string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'
result = re.match(r'(.*?)\)([ \t]+)?(?P<item>.*)\(.*?(?P<dollaramount>(\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)
print repr(result.groups())
print result.group('item')
print result.group('dollaramount')
Changing them to *?
makes the match the minimum .
This can be expensive in some RE engines, so you can also write eg \\([^)]*\\)
to match all the parenthesis. If you're not processing a lot of text it probably doesn't matter.
btw, you should really use raw strings (ie r'something'
) for regexps, to avoid surprising backslash behaviour, and to give the reader a clue.
I see you had this group (\\(.*?\\))?
which presumably was cutting out the (TUD)
, but if you actually want that in the title, just remove it.
这就是我如何通过演示来做到这一点
(?P<booknumber>\\d+(?:\\.\\d+)?)\\)\\s+(?P<item>.*?)\\s+\\(\\d+(?:\\.\\d+)?\\s+Dollars\\)
我建议你使用正则表达式模式
(?P<booknumber>[^)]*)\)\s+(?P<item>.*\S)\s+\((?!.*\()(?P<amount>\S+)\s+Dollars?\)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.