简体   繁体   English

使用正则表达式匹配字符串的一部分?(python)

[英]Using Regular expressions to match a portion of the string?(python)

What regular expression can i use to match genes( in bold ) in the gene list string: 我可以使用什么正则表达式来匹配基因列表字符串中的基因( 粗体 ):

GENE_LIST: F59A7.7 ; GENE_LIST: F59A7.7 T25D3.3 ; T25D3.3 ; F13B12.4 ; F13B12.4 ; cysl-1 ; cysl-1 ; cysl-2 ; cysl-2 ; cysl-3 ; cysl-3 ; cysl-4 ; cysl-4 ; F01D4.8 F01D4.8

I tried : GENE_List:((( \\w+).(\\w+)); )+* but it only captures the last gene 我试过了: GENE_List:((((ww +)。(\\ w +)); )+ *但它只能捕获最后一个基因

Given: 鉴于:

>>> s="GENE_LIST: F59A7.7; T25D3.3; F13B12.4; cysl-1; cysl-2; cysl-3; cysl-4; F01D4.8"

You can use Python string methods to do: 您可以使用Python字符串方法执行以下操作:

>>> s.split(': ')[1].split('; ')
['F59A7.7', 'T25D3.3', 'F13B12.4', 'cysl-1', 'cysl-2', 'cysl-3', 'cysl-4', 'F01D4.8']

For a regex: 对于正则表达式:

(?<=[:;]\s)([^\s;]+)

Demo 演示版

Or, in Python: 或者,在Python中:

>>> re.findall(r'(?<=[:;]\s)([^\s;]+)', s)
['F59A7.7', 'T25D3.3', 'F13B12.4', 'cysl-1', 'cysl-2', 'cysl-3', 'cysl-4', 'F01D4.8']

You can use the following: 您可以使用以下内容:

\s([^;\s]+)

Demo 演示版

  • The captured group, ([^;\\s]+) , will contain the desired substrings followed by whitespace ( \\s ) 捕获的组([^;\\s]+)将包含所需的子字符串,后跟空格( \\s

>>> s = 'GENE_LIST: F59A7.7; T25D3.3; F13B12.4; cysl-1; cysl-2; cysl-3; cysl-4; F01D4.8'
>>> re.findall(r'\s([^;\s]+)', s)
['F59A7.7', 'T25D3.3', 'F13B12.4', 'cysl-1', 'cysl-2', 'cysl-3', 'cysl-4', 'F01D4.8']

UPDATE 更新

It's in fact much simpler: 实际上要简单得多:

[^\s;]+

however, first use substring to take only the part you need (the genes, without GENELIST ) 但是,首先使用子字符串仅获取您需要的部分(基因,不包含GENELIST)

demo: regex demo 演示: 正则表达式演示

string = "GENE_LIST: F59A7.7; T25D3.3; F13B12.4; cysl-1; cysl-2; cysl-3; cysl-4; F01D4.8"
re.findall(r"([^;\s]+)(?:;|$)", string)

The output is: 输出为:

['F59A7.7',
'T25D3.3',
'F13B12.4',
'cysl-1',
'cysl-2',
'cysl-3',
'cysl-4',
'F01D4.8']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM