[英]python regular expression. Extract text between patterns
如何獲得下面'str'中'uniprotkb:'和'(基因名稱)'之間的所有值:
str = 'uniprotkb:HIST1H3D(gene name)|uniprotkb:HIST1H3A(gene name)|uniprotkb:HIST1H3B(gene name)|uniprotkb:HIST1H3C(gene name)|uniprotkb:HIST1H3E(gene name)|uniprotkb:HIST1H3F(gene name)|uniprotkb:HIST1H3G(gene name)|uniprotkb:HIST1H3H(gene name)|uniprotkb:HIST1H3I(gene name)|uniprotkb:HIST1H3J(gene name)'
結果是:
HIST1H3D
HIST1H3A
HIST1H3B
HIST1H3C
HIST1H3E
HIST1H3F
HIST1H3G
HIST1H3H
HIST1H3I
HIST1H3J
使用re.findall(),您可以獲得與正則表達式匹配的字符串的所有部分:
>>> import re
>>> sstr = 'uniprotkb:HIST1H3D(gene name)|uniprotkb:HIST1H3A(gene name)|uniprotkb:HIST1H3B(gene name)|uniprotkb:HIST1H3C(gene name)|uniprotkb:HIST1H3E(gene name)|uniprotkb:HIST1H3F(gene name)|uniprotkb:HIST1H3G(gene name)|uniprotkb:HIST1H3H(gene name)|uniprotkb:HIST1H3I(gene name)|uniprotkb:HIST1H3J(gene name)'
>>> re.findall(r'uniprotkb:([^(]*)\(gene name\)', sstr)
['HIST1H3D', 'HIST1H3A', 'HIST1H3B', 'HIST1H3C', 'HIST1H3E', 'HIST1H3F', 'HIST1H3G', 'HIST1H3H', 'HIST1H3I', 'HIST1H3J']
這是一個oneliner:
astr = 'uniprotkb:HIST1H3D(gene name)|uniprotkb:HIST1H3A(gene name)|uniprotkb:HIST1H3B(gene name)|uniprotkb:HIST1H3C(gene name)|uniprotkb:HIST1H3E(gene name)|uniprotkb:HIST1H3F(gene name)|uniprotkb:HIST1H3G(gene name)|uniprotkb:HIST1H3H(gene name)|uniprotkb:HIST1H3I(gene name)|uniprotkb:HIST1H3J(gene name)'
[pt.split('(')[0] for pt in astr.strip().split('uniprotkb:')][1:]
得到:
['HIST1H3D',
'HIST1H3A',
'HIST1H3B',
'HIST1H3C',
'HIST1H3E',
'HIST1H3F',
'HIST1H3G',
'HIST1H3H',
'HIST1H3I',
'HIST1H3J']
如果運行時很重要,我不推薦使用regexp解決方案。
我不打擾正則表達式:
s = 'uniprotkb:HIST1H3D(gene name)|uniprotkb:HIST1H3A(gene name)' # etc
gene_names = []
for substring in s.split('|'):
removed_first = substring.partition('uniprotkb:')[2] # remove the first part of the substring
removed_second = removed_first.partition('(gene name)')[0] # remove the second part
gene_names.append(removed_second) # put it on the list
應該做的伎倆。 你甚至可以單行 - 上面相當於:
gene_names = [substring.partition('uniprotkb:')[2].partition('(gene name)')[0] for substring in s.split('|')]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.