![](/img/trans.png)
[英]Extracting portion of the string text with start and end matches by using regular expressions in Python
[英]Using Regular expressions to match a portion of the string?(python)
我可以使用什么正則表達式來匹配基因列表字符串中的基因( 粗體 ):
GENE_LIST: F59A7.7 ; T25D3.3 ; F13B12.4 ; cysl-1 ; cysl-2 ; cysl-3 ; cysl-4 ; F01D4.8
我試過了: GENE_List:((((ww +)。(\\ w +)); )+ *但它只能捕獲最后一個基因
鑒於:
>>> s="GENE_LIST: F59A7.7; T25D3.3; F13B12.4; cysl-1; cysl-2; cysl-3; cysl-4; F01D4.8"
您可以使用Python字符串方法執行以下操作:
>>> s.split(': ')[1].split('; ')
['F59A7.7', 'T25D3.3', 'F13B12.4', 'cysl-1', 'cysl-2', 'cysl-3', 'cysl-4', 'F01D4.8']
對於正則表達式:
(?<=[:;]\s)([^\s;]+)
或者,在Python中:
>>> re.findall(r'(?<=[:;]\s)([^\s;]+)', s)
['F59A7.7', 'T25D3.3', 'F13B12.4', 'cysl-1', 'cysl-2', 'cysl-3', 'cysl-4', 'F01D4.8']
您可以使用以下內容:
\s([^;\s]+)
([^;\\s]+)
將包含所需的子字符串,后跟空格( \\s
) >>> s = 'GENE_LIST: F59A7.7; T25D3.3; F13B12.4; cysl-1; cysl-2; cysl-3; cysl-4; F01D4.8'
>>> re.findall(r'\s([^;\s]+)', s)
['F59A7.7', 'T25D3.3', 'F13B12.4', 'cysl-1', 'cysl-2', 'cysl-3', 'cysl-4', 'F01D4.8']
string = "GENE_LIST: F59A7.7; T25D3.3; F13B12.4; cysl-1; cysl-2; cysl-3; cysl-4; F01D4.8"
re.findall(r"([^;\s]+)(?:;|$)", string)
輸出為:
['F59A7.7',
'T25D3.3',
'F13B12.4',
'cysl-1',
'cysl-2',
'cysl-3',
'cysl-4',
'F01D4.8']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.