简体   繁体   中英

what is wrong with this simple regex?

I am stuck in a dump:

import re
print re.search('return[^$]+',
                'return to the Treasury of $40 million\nnow!').group(0)

The above regex only prints return to the Treasury of , but I expected it to include $40 million . What I understand from regex is that I am asking it to take every thing until the end of the line .

I do not want to use .* , I want endline delimiter to go until the end of line from some point. If I remove $ from search string it prints the full string. Why is endline delimiter matching with dollar sign??

return[^$]+

will match a string "return" followed by any character that is not '$' one or more times.

This is because [ ] mean character group and inside [ ] the special characters are threaded as simple characters.

Thus it matches only until the the dollar sign.

Why not use:

return.+$

this is exactly what you want.

Why don't you want to use .* ?

The regex you have will match any string that starts with "return", then one or more characters that are not the "$" character. Note that this will NOT look for the end-of-line marker.

return.*$ will match everything up to and including the end of line marker. You may (but probably not) need to make the .* a lazy matcher if you are dealing with multi-line input.

import re
text = 'we will return to the Treasury of $40 million\nunits of money.'
re.search(r'return.*$', text, re.MULTILINE).group(0)

# prints 'we will return to the Treasury of $40 million'

You need to include the multiline flag, then $ will match at newlines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM