简体   繁体   English

python - 在括号之间返回文本

[英]python - Return Text Between Parenthesis

I have file contains several lines of strings written as : 我有文件包含几行字符串写为:

[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ

I need the text inside the parentheses only. 我只需要括号内的文字。 I try to use the following code : 我尝试使用以下代码:

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)
string = re.compile ('\(.*?\)')
stringExtract2 =  string.findall (str(stringExtract))

but some strings (or text) not exist in the output eg, for the above string the word (with) not found in the output. 但是输出中不存在一些字符串(或文本),例如,对于上面的字符串,输出中找不到单词(with)。 Also the arrangement of strings differs from the file, eg, for strings (enlar) and (ged ) above, the second one (ged ) appeared before (enlar), such as : ( ged other strings ..... enlar) How I can fix these problems? 字符串的排列也与文件不同,例如,对于上面的字符串(放大)和(ged),第二个(ged)出现在(放大)之前,例如:(ged其他字符串.....放大)我能解决这些问题吗?

Without regexp: 没有正则表达式:

[p.split(')')[0] for p in s.split('(') if ')' in p]

Output: 输出:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

Try this: 尝试这个:

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()
stringExtract2 = re.findall(r'\(([^()]+)\)', readstream)

Input: 输入:

readstream = r'[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )]'

Output: 输出:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

findall looks like your friend here. findall看起来像你的朋友。 Don't you just want: 你不想要:

re.findall(r'\(.*?\)',readstream)

returns: 收益:

['(W)',
 '(indo)',
 '(ws )',
 '(XP)',
 '(, )',
 '(with )',
 '(the )',
 '(fragment )',
 '(enlar)',
 '(ged )',
 '(for )',
 '(clarity )',
 '(on )',
 '(Fig. )']

Edit : as @vikramis showed, to remove the parens, use: re.findall(r'\\((.*?)\\)', readstream) . 编辑 :正如@vikramis所示,要删除parens,请使用: re.findall(r'\\((.*?)\\)', readstream) Also, note that it is common (but not requested here) to trim trailing whitespace with something like: 此外,请注意,通过以下方式修剪尾随空格是很常见的(但不是在此请求):

re.findall(r'\((.*?) *\)', readstream)

your first problem is 你的第一个问题是

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)

I have no idea why you are doing this and im pretty sure you dont want to do this 我不知道你为什么这样做,我很确定你不想这样做

try this instead 试试这个

 readstream = "[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ"
 stringExtract = re.findall('\(([^)]+)\)', readstream, re.DOTALL)

which says find everything inside parenthesis that is not a closing parenthesis 其中说找到括号内的所有内容都不是右括号

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM