简体   繁体   中英

Python re capture one match per word

I need to find prices in text document. My code looks like this:

sentence = "This is test text $25,000 $25,000$20,000 $30"
pattern = re.compile(ur'[$€£]?\d+([.,]\d+)?', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall(sentence)

Desired result is:

['$25,000', '$30']

I don't need to include $25,000$20,000 in the result becouse this is not valid result for my task. I need only full word matches.

But i get this result:

['$25,000', '$25,000', '$20,000', '$30']

How to rewrite my regex to include only prices separated by whitespace or punctuation ?

This is as close as I can get it (although there are many people with more regex skills than I have):

pattern = re.compile(ur'(?:^|\s)[$€£]?\d+(?:[.,]\d+)?(?=\s|$)', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall(sentence) # [' $25,000', ' $30']

Try the following:

ur'(?<!\S)[€£$]?\d+(?:[.,]\d+)?(?!\S)'

I added the negative assertions (?<!\\S) and (?!\\S) which mean "fail to match if preceded by a non-space" and "fail to match if followed by a non-space" respectively.

Tested:

>>> sentence = "$1234 $56$78.90 This is test text $25,000 $25,000$20,000 $30"
>>> pattern = re.compile(ur'(?<!\S)[€£$]?\d+(?:[.,]\d+)?(?!\S)', re.UNICODE | re.MULTILINE | re.DOTALL)
>>> print pattern.findall(sentence)
['$1234', '$25,000', '$30']

If you want to allow certain non-space characters before or after the match, replace \\S by [^\\s<chars>] where <chars> are the characters you want to allow. Example:

ur'(?<![^\s:])[€£$]?\d+(?:[.,]\d+)?(?![^\s,.])'

allows the pattern to be preceded by a : and followed by , or . :

>>> sentence = "$1234 $56$78.90 This is test text:$25,000. $45. $25,000$20,000 $30"
>>> print pattern.findall(sentence)
['$1234', '$25,000', '$45', '$30']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM