简体   繁体   中英

Python regex capture group issue

Trying to specify my capture group, but it keeps capturing way too much.

Line:

"This is something of [Interest: stuff]. blah blah blah"

Regex:

patt = re.compile('\[Interest:(.){1,100}\]')

What is output:

[Interest: stuff]

What I want output:

stuff

How can I output just what I want to capture, and not the entire pattern?

I've also tried this:

re.compile(r'\[Interest:(?P<interest>.+)\]')

That outputs:

stuff]. blah blah blah

I feel like im pretty close. Just need to figure out how to stop the output once regex hits the ]

The . character matches everything except for newlines, including ] . So, (.){1,100} is telling Python to get everything it can up to 100 characters long. This includes the end of the string.

Instead, I would use this pattern:

\[Interest:\s([^\]]*)\]

Demo:

>>> import re
>>> string = "This is something of [Interest: stuff]. blah blah blah"
>>> re.search("\[Interest:\s([^\]]*)\]", string).group(1)
'stuff'
>>>

Below is an explanation of what it matches:

\[         # [
Interest:  # Interest:
\s         # A space
(          # The start of a capture group
[^\]]*     # Zero or more characters that are not ]
)          # The close of the capture group
\]         # ]

For more information, see Regular Expression Syntax .

Get the matched group from index 1 using lazy way.

\[Interest: (.*?)\]

DEMO

sample code:

import re
p = re.compile(ur'\[Interest: (.*?)\]', re.IGNORECASE)
test_str = u"This is something of [Interest: stuff]. blah blah blah"

re.match(p, test_str)

one issue with you regex: \\[Interest:(.){1,100}\\] , is that (.){1,100} will allow 1 to 100 of . BUT it'll capture just one . , the last . , because the ( ) are enclosing only . (which refers to a single char). therefore the captured group will contain the f of stuff .

instead, \\[Interest: (.{1,100})\\] will return stuff .

as for the output being [Interest: stuff] .. that is a grouping issue.
Try iCodez 's code HERE :

>>> import re
>>> string = "This is something of [Interest: stuff]. blah blah blah"
>>> re.search("\[Interest:\s([^\]]*?)\]", string).group(1)

it prints stuff .

replace .group(1) with .group(0) and it prints [Interest: stuff] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM