[英]python re: r'\b \$ \d+ \b' won't match 'aug 12, 2010 abc $123'
so i'm just making a script to collect $ values from a transaction log type file 所以我只是制作一个脚本来从事务日志类型文件中收集$值
for line in sys.stdin:
match = re.match( r'\b \$ (\d+) \b', line)
if match is not None:
for value in match.groups():
print value
right now I'm just trying to print those values it would match a line containing $12323 but not when there are other things in the line From what I read it should work, but looks like I could be missing something 现在我只是试图打印那些与包含12323美元的行相匹配的值,但是当行中还有其他内容的时候没有。从我读到的它应该可行,但看起来我可能会遗漏某些东西
If zero or more characters at the beginning of string match this regular expression, return a corresponding MatchObject instance.
如果字符串开头的零个或多个字符与此正则表达式匹配,则返回相应的MatchObject实例。 Return None if the string does not match the pattern;
如果字符串与模式不匹配,则返回None; note that this is different from a zero-length match.
请注意,这与零长度匹配不同。
What your are looking for is either re.search
or re.findall
: 您正在寻找的是
re.search
或re.findall
:
#!/usr/bin/env python
import re
s = 'aug 12, 2010 abc $123'
print re.findall(r'\$(\d+)', s)
# => ['123']
print re.search(r'\$(\d+)', s).group()
# => $123
print re.search(r'\$(\d+)', s).group(1)
# => 123
By having a space between \\$
and (\\d+)
, the regex expects a space in your string between them. 通过在
\\$
和(\\d+)
之间留一个空格,正则表达式期望在它们之间的字符串中有一个空格。 Is there such a space? 有这样的空间吗?
I am not so clear what is accepted for you but from statement 我不太清楚接受你的是什么,而是声明
a line containing $12323 but not when there are other things in the line
包含$ 12323的行,但在行中还有其他内容时则不行
I would get that 我会那样的
'aug 12, 2010 abc $123'
'aug 12,2010 abc $ 123'
Is not supposed to match as it has other text befor the amount. 不应该匹配,因为它有其他文本的金额。
If you want to match amount at end of the line here is the customary anti-regexp answer (even I am not against of using them in easy cases): 如果你想在这一行的最后匹配数量,那么这是习惯性的反正则表达式答案(即使我不反对在简单的情况下使用它们):
loglines = ['aug 12, 2010 abc $123', " $1 ", "a $1 amount", "exactly $1 - no less"]
# match $amount at end of line without other text after
for line in loglines:
if '$' in line:
_,_, amount = line.rpartition('$')
try:
amount = float(amount)
except:
pass
else:
print "$%.2f" % amount
Others have already pointed out some shortcomings of your regex (especially the mandatory spaces and re.match
vs. re.search
). 其他人已经指出了你的正则表达式的一些缺点(特别是强制空格和
re.match
与re.search
)。
There is another thing, though: \\b
word anchors match between alphanumeric and non-alphanumeric characters. 但是还有另一件事:
\\b
字锚在字母数字字符和非字母数字字符之间匹配。 In other words, \\b \\$
will fail (even when doing a search instead of a match operation) unless the string has some alphanumeric characters before the space. 换句话说,
\\b \\$
将失败(即使在进行搜索而不是匹配操作时),除非该字符串在空格之前有一些字母数字字符。
Example (admittedly contrived) to work with your regex: 与你的正则表达式一起使用的例子(公认的做作):
>>> import re
>>> test = [" $1 ", "a $1 amount", "exactly $1 - no less"]
>>> for string in test:
... print(re.search(r"\b \$\d+ \b", string))
...
None
<_sre.SRE_Match object at 0x0000000001DD4370>
None
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.