简体   繁体   English

如何匹配以下正则表达式python?

[英]How to match the following regex python?

How to match the following with regex? 如何将以下与正则表达式匹配?

string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'

I am trying the following: 我正在尝试以下方法:

groupsofmatches = re.match('(?P<booknumber>.*)\)([ \t]+)?(?P<item>.*)(\(.*\))?\(.*?((\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)

The issue is when I apply it to string2 it works fine, but when I apply the expression to string1, I am unable to get the "m.group(name)" because of the "(TUD)" part. 问题是,当我将其应用于string2时,它可以正常工作,但是当我将表达式应用于string1时,由于存在“(TUD)”部分,因此无法获得“ m.group(name)”。 I want to use a single expression that works for both strings. 我想使用适用于两个字符串的单个表达式。

I expect: 我预计:

booknumber = 1.0
item = The Ugly Duckling (TUD)

You could impose some heavier restrictions on your repeated characters: 您可以对重复的字符施加一些更大的限制:

groupsofmatches = re.match('([^)]*)\)[ \t]*(?P<item>.*)\([^)]*?(?P<dollaramount>(?:\d+)?(?:\.\d+)?)[^)]*\)$', string1)

This will make sure that the numbers are taken from the last set of parentheses. 这样可以确保数字取自最后一组括号。

我将其写为:

num, name, value = re.match(r'(.+?)\) (.*?) \(([\d.]+) Dollars\)', s2).groups()

Your problem is that .* matches greedily, and it may be consuming too much of the string. 您的问题是.*贪婪地匹配,这可能会占用过多的字符串。 Printing all of the match groups will make this more obvious: 打印所有匹配组将使这一点更加明显:

import re

string1 = '1.0) The Ugly Duckling (TUD) (10 Dollars)'
string2 = '1.0) Little 1 Red Riding Hood (9.50 Dollars)'

result = re.match(r'(.*?)\)([ \t]+)?(?P<item>.*)\(.*?(?P<dollaramount>(\d+)?(\.\d+)?).*([ \t]+)?Dollars(\))?', string1)

print repr(result.groups())
print result.group('item')
print result.group('dollaramount')

Changing them to *? 将它们更改为*? makes the match the minimum . 使匹配最小

This can be expensive in some RE engines, so you can also write eg \\([^)]*\\) to match all the parenthesis. 在某些RE引擎中,这可能会很昂贵,因此您也可以编写\\([^)]*\\)来匹配所有括号。 If you're not processing a lot of text it probably doesn't matter. 如果您不处理大量文本,则可能没有关系。

btw, you should really use raw strings (ie r'something' ) for regexps, to avoid surprising backslash behaviour, and to give the reader a clue. 顺便说一句,您确实应该对正则r'something'使用原始字符串(即r'something' ),以避免令人惊讶的反斜杠行为,并为读者提供线索。

I see you had this group (\\(.*?\\))? 我看到您有这个群组(\\(.*?\\))? which presumably was cutting out the (TUD) , but if you actually want that in the title, just remove it. 大概是在(TUD) ,但是如果您确实想要标题中的内容,则将其删除。

这就是我如何通过演示来做到这一点

(?P<booknumber>\\d+(?:\\.\\d+)?)\\)\\s+(?P<item>.*?)\\s+\\(\\d+(?:\\.\\d+)?\\s+Dollars\\)

我建议你使用正则表达式模式

(?P<booknumber>[^)]*)\)\s+(?P<item>.*\S)\s+\((?!.*\()(?P<amount>\S+)\s+Dollars?\)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM