简体   繁体   中英

Beginning (python) regex question

I'm having trouble figuring out why the following regex doesn't seem to work.

I know that I can form other regular expressions to make this work, but I thought this one should work.

re.search ("(\d*)", "prefix 1234 suffix").groups()
('',)

Interestingly, findall seems to work:

re.findall("(\d*)", "prefix 1234 suffix")
['', '', '', '', '', '', '', '1234', '', '', '', '', '', '', '', '']

I understand why that works, but I'm still confused as to why search doesn't work? My understanding is that match should force it to match the whole string, but search should find the digits anywhere within the string

Because .search runs the search once, and matches in the first place it can. Since \\d* can match no characters at all, the first place it can match is at the beginning of the string, capturing no characters -- so the first capture group is '' . It's doing exactly what you asked it to.

If you had made the regex (\\d+) instead, which has to match at least one digit, then the first place it could match would be at the 1 and it would capture 1234 .

It works. Return values of your first example corresponds to the first element of return value of findall . Just use: r'(\\d+)' as your regex.

search does find digits anywhere within the string, it just your regex that tells to find digits zero or more times. So it finds, zero digits at every character border.

Use \\d+ , not \\d* . \\d* means zero or more, and that zero is what you get at the offset 0 in the string.

Try this:

re.findall("(\d+)", "prefix 1234 suffix")

By changing the * to a + you are indicating that the pattern \\d must match 1 or more times. The * you were using at first was matching zero or more times which was matching each and every character in the string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM