简体   繁体   中英

Python regular expression gives unexpected result

I'm trying to create an svn pre-commit hook, but can't get my regular expression to work as expected. It should print False for messages that do not look like "DEV-5 | some message". Why do I get True here?

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile("^\[[A-Z]+-[0-9]+\] | .+$", re.DOTALL)
>>> message = "test message"
>>> match = p.search(message)
>>> bool(match)
True
>>> p = re.compile("^[A-Z]+-[0-9]+ \| .+$", re.DOTALL)
>>> print p.search("test message")
None
>>> print p.search("DEV-5 | some message")
<_sre.SRE_Match object at 0x800eb78b8>
  • you don't need \\[ and \\]
  • you need to escape |

The culprit is the trailing " | .+$" which is matching ' message' as an alternative to the first regex. As Roman pointed out you meant to match literal '|' so you have to escape it as '\\|'.

To see what was being matched, you can do:

print match.group()
' message'

(By the way, a faster non-regex way to only handle lines containing vertical bar would use line.split('|'):

for line in ...:
   parts = line.split('|',1)
   if len(parts)==1: continue
   (code,mesg) = parts

我没有运行代码,但是我怀疑您的正则表达式中替代( | )后的部分与任何以空格开头的非空字符串匹配,在本例中为" message"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM