简体   繁体   English

python re:r'\\ b \\ $ \\ d + \\ b'将不匹配'aug 12,2010 abc $ 123'

[英]python re: r'\b \$ \d+ \b' won't match 'aug 12, 2010 abc $123'

so i'm just making a script to collect $ values from a transaction log type file 所以我只是制作一个脚本来从事务日志类型文件中收集$值

for line in sys.stdin:
    match = re.match( r'\b \$ (\d+) \b', line)
    if match is not None:
            for value in match.groups():
                    print value

right now I'm just trying to print those values it would match a line containing $12323 but not when there are other things in the line From what I read it should work, but looks like I could be missing something 现在我只是试图打印那些与包含12323美元的行相匹配的值,但是当行中还有其他内容的时候没有。从我读到的它应该可行,但看起来我可能会遗漏某些东西

re.match : re.match

If zero or more characters at the beginning of string match this regular expression, return a corresponding MatchObject instance. 如果字符串开头的零个或多个字符与此正则表达式匹配,则返回相应的MatchObject实例。 Return None if the string does not match the pattern; 如果字符串与模式不匹配,则返回None; note that this is different from a zero-length match. 请注意,这与零长度匹配不同。

What your are looking for is either re.search or re.findall : 您正在寻找的是re.searchre.findall

#!/usr/bin/env python

import re
s = 'aug 12, 2010 abc $123'

print re.findall(r'\$(\d+)', s)
# => ['123']

print re.search(r'\$(\d+)', s).group()
# => $123

print re.search(r'\$(\d+)', s).group(1)
# => 123

By having a space between \\$ and (\\d+) , the regex expects a space in your string between them. 通过在\\$(\\d+)之间留一个空格,正则表达式期望在它们之间的字符串中有一个空格。 Is there such a space? 有这样的空间吗?

I am not so clear what is accepted for you but from statement 我不太清楚接受你的是什么,而是声明

a line containing $12323 but not when there are other things in the line 包含$ 12323的行,但在行中还有其他内容时则不行

I would get that 我会那样的

'aug 12, 2010 abc $123' 'aug 12,2010 abc $ 123'

Is not supposed to match as it has other text befor the amount. 不应该匹配,因为它有其他文本的金额。

If you want to match amount at end of the line here is the customary anti-regexp answer (even I am not against of using them in easy cases): 如果你想在这一行的最后匹配数量,那么这是习惯性的反正则表达式答案(即使我不反对在简单的情况下使用它们):

loglines = ['aug 12, 2010 abc $123', " $1 ", "a $1 amount", "exactly $1 - no less"]

# match $amount at end of line without other text after
for line in loglines:
    if '$' in line:
        _,_, amount = line.rpartition('$')
        try:
            amount = float(amount)
        except:
            pass
        else:
            print "$%.2f" % amount

Others have already pointed out some shortcomings of your regex (especially the mandatory spaces and re.match vs. re.search ). 其他人已经指出了你的正则表达式的一些缺点(特别是强制空格和re.matchre.search )。

There is another thing, though: \\b word anchors match between alphanumeric and non-alphanumeric characters. 但是还有另一件事: \\b字锚字母数字字符和非字母数字字符之间匹配。 In other words, \\b \\$ will fail (even when doing a search instead of a match operation) unless the string has some alphanumeric characters before the space. 换句话说, \\b \\$将失败(即使在进行搜索而不是匹配操作时),除非该字符串在空格之前有一些字母数字字符。

Example (admittedly contrived) to work with your regex: 与你的正则表达式一起使用的例子(公认的做作):

>>> import re
>>> test = [" $1 ", "a $1 amount", "exactly $1 - no less"]
>>> for string in test:
...     print(re.search(r"\b \$\d+ \b", string))
...
None
<_sre.SRE_Match object at 0x0000000001DD4370>
None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM