简体   繁体   中英

python - Return Text Between Parenthesis

I have file contains several lines of strings written as :

[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ

I need the text inside the parentheses only. I try to use the following code :

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)
string = re.compile ('\(.*?\)')
stringExtract2 =  string.findall (str(stringExtract))

but some strings (or text) not exist in the output eg, for the above string the word (with) not found in the output. Also the arrangement of strings differs from the file, eg, for strings (enlar) and (ged ) above, the second one (ged ) appeared before (enlar), such as : ( ged other strings ..... enlar) How I can fix these problems?

Without regexp:

[p.split(')')[0] for p in s.split('(') if ')' in p]

Output:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

Try this:

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()
stringExtract2 = re.findall(r'\(([^()]+)\)', readstream)

Input:

readstream = r'[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )]'

Output:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

findall looks like your friend here. Don't you just want:

re.findall(r'\(.*?\)',readstream)

returns:

['(W)',
 '(indo)',
 '(ws )',
 '(XP)',
 '(, )',
 '(with )',
 '(the )',
 '(fragment )',
 '(enlar)',
 '(ged )',
 '(for )',
 '(clarity )',
 '(on )',
 '(Fig. )']

Edit : as @vikramis showed, to remove the parens, use: re.findall(r'\\((.*?)\\)', readstream) . Also, note that it is common (but not requested here) to trim trailing whitespace with something like:

re.findall(r'\((.*?) *\)', readstream)

your first problem is

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)

I have no idea why you are doing this and im pretty sure you dont want to do this

try this instead

 readstream = "[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ"
 stringExtract = re.findall('\(([^)]+)\)', readstream, re.DOTALL)

which says find everything inside parenthesis that is not a closing parenthesis

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM