Trying to specify my capture group, but it keeps capturing way too much.
Line:
"This is something of [Interest: stuff]. blah blah blah"
Regex:
patt = re.compile('\[Interest:(.){1,100}\]')
What is output:
[Interest: stuff]
What I want output:
stuff
How can I output just what I want to capture, and not the entire pattern?
I've also tried this:
re.compile(r'\[Interest:(?P<interest>.+)\]')
That outputs:
stuff]. blah blah blah
I feel like im pretty close. Just need to figure out how to stop the output once regex hits the ]
The .
character matches everything except for newlines, including ]
. So, (.){1,100}
is telling Python to get everything it can up to 100
characters long. This includes the end of the string.
Instead, I would use this pattern:
\[Interest:\s([^\]]*)\]
Demo:
>>> import re
>>> string = "This is something of [Interest: stuff]. blah blah blah"
>>> re.search("\[Interest:\s([^\]]*)\]", string).group(1)
'stuff'
>>>
Below is an explanation of what it matches:
\[ # [
Interest: # Interest:
\s # A space
( # The start of a capture group
[^\]]* # Zero or more characters that are not ]
) # The close of the capture group
\] # ]
For more information, see Regular Expression Syntax .
Get the matched group from index 1 using lazy way.
\[Interest: (.*?)\]
sample code:
import re
p = re.compile(ur'\[Interest: (.*?)\]', re.IGNORECASE)
test_str = u"This is something of [Interest: stuff]. blah blah blah"
re.match(p, test_str)
one issue with you regex: \\[Interest:(.){1,100}\\]
, is that (.){1,100}
will allow 1 to 100 of .
BUT it'll capture just one .
, the last .
, because the (
)
are enclosing only .
(which refers to a single char). therefore the captured group will contain the f
of stuff
.
instead, \\[Interest: (.{1,100})\\]
will return stuff
.
as for the output being [Interest: stuff]
.. that is a grouping issue.
Try iCodez
's code HERE :
>>> import re
>>> string = "This is something of [Interest: stuff]. blah blah blah"
>>> re.search("\[Interest:\s([^\]]*?)\]", string).group(1)
it prints stuff
.
replace .group(1)
with .group(0)
and it prints [Interest: stuff]
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.