[英]extracting strings using regular expression
I have the following strings:我有以下字符串:
How to extract the below strings from the above strings?如何从上面的字符串中提取下面的字符串?
s = '''LOW QUALITY PROTEIN: cysteine proteinase 5-like [Solanum pennellii]
PREDICTED: LOW QUALITY PROTEIN: uncharacterized protein LOC107059219 [Solanum pennellii]
XP_019244624.1 PREDICTED: peroxidase 40-like [Nicotiana attenuata]
RVW92024.1 Retrovirus-related Pol polyprotein from transposon TNT 1-94 [Vitis vinifera]
hypothetical protein VITISV_035070 [Vitis vinifera]'''
import re
rgx = '(:?)\s([\w\s-]+)\s(\[.+\])'
list1 = []
for m in re.findall(rgx, s):
list1.append(m[1])
print(list1)
Output Output
['cysteine proteinase 5-like ',
'uncharacterized protein LOC107059219',
'peroxidase 40-like',
'Retrovirus-related Pol polyprotein from transposon TNT 1-94',
'hypothetical protein VITISV_035070']
Look up https://regex101.com/r/HATKMa/1 for the explanation in detail.查看https://regex101.com/r/HATKMa/1了解详细说明。
I think this problem don't need regex.我认为这个问题不需要正则表达式。 I would prefer following solution because it is easy to understand
我更喜欢以下解决方案,因为它很容易理解
st = "PREDICTED: LOW QUALITY PROTEIN: uncharacterized protein LOC107059219 [Solanum pennellii]"
st.split(":")[-1].split("[")[0].strip()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.