简体   繁体   中英

How to match the following regex python?

How to match the following with regex?

string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'

I am trying the following:

groupsofmatches = re.match('(?P<booknumber>.*)\)([ \t]+)?(?P<item>.*)(\(.*\))?\(.*?((\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)

The issue is when I apply it to string2 it works fine, but when I apply the expression to string1, I am unable to get the "m.group(name)" because of the "(TUD)" part. I want to use a single expression that works for both strings.

I expect:

booknumber = 1.0
item = The Ugly Duckling (TUD)

You could impose some heavier restrictions on your repeated characters:

groupsofmatches = re.match('([^)]*)\)[ \t]*(?P<item>.*)\([^)]*?(?P<dollaramount>(?:\d+)?(?:\.\d+)?)[^)]*\)$', string1)

This will make sure that the numbers are taken from the last set of parentheses.

我将其写为:

num, name, value = re.match(r'(.+?)\) (.*?) \(([\d.]+) Dollars\)', s2).groups()

Your problem is that .* matches greedily, and it may be consuming too much of the string. Printing all of the match groups will make this more obvious:

import re

string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'

result = re.match(r'(.*?)\)([ \t]+)?(?P<item>.*)\(.*?(?P<dollaramount>(\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)

print repr(result.groups())
print result.group('item')
print result.group('dollaramount')

Changing them to *? makes the match the minimum .

This can be expensive in some RE engines, so you can also write eg \\([^)]*\\) to match all the parenthesis. If you're not processing a lot of text it probably doesn't matter.

btw, you should really use raw strings (ie r'something' ) for regexps, to avoid surprising backslash behaviour, and to give the reader a clue.

I see you had this group (\\(.*?\\))? which presumably was cutting out the (TUD) , but if you actually want that in the title, just remove it.

这就是我如何通过演示来做到这一点

(?P<booknumber>\\d+(?:\\.\\d+)?)\\)\\s+(?P<item>.*?)\\s+\\(\\d+(?:\\.\\d+)?\\s+Dollars\\)

我建议你使用正则表达式模式

(?P<booknumber>[^)]*)\)\s+(?P<item>.*\S)\s+\((?!.*\()(?P<amount>\S+)\s+Dollars?\)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM