序列搜索期间的正则表达式错误

Question

Thanks in advance, I am trying to extract the last few digits of a code which is a taxa id from NCBI.在此先感谢，我正在尝试从 NCBI 中提取作为分类群 ID 的代码的最后几位数字。 I want the bolded numbers from this tag, however these digits are variable in length and value:我想要这个标签中的粗体数字，但是这些数字的长度和值是可变的：

tag: URS0000D94775_ 60169标签： URS0000D94775_60169

code:代码：

import re
taxID = ()
#strip accession numbers into string
mount = open ('mount.txrt', 'r')
accessions = (re.findall ("URS\S{6}", mount))
     for i in accessions:
     taxID.append (i)
 #parse taxa id's from string        
 taxas = ()
 taxas.append (re.findall ('\_?\d+', taxID)) 
 print ( mount)

Answer 1

Use re.findall with the regex below:将re.findall与下面的正则表达式一起使用：

import re
tag = 'URS0000D94775_60169'
tax_id = re.findall(r'\d+$', tag)[0]
print(tax_id)
# 60169

Answer 2

As you have an optional _ in your pattern and you want to match the digits after URS, you can use由于您的模式中有一个可选的_并且您想要匹配 URS 之后的数字，您可以使用

URS(?:.*?\D)?(\d+)$

Regex demo正则表达式演示

import re
s = "URS0000D94775_60169"
print(re.findall("URS(?:.*?\D)?(\d+)$", s))

Output Output

If there has to be an underscore present:如果必须存在下划线：

URS[^_]*_(\d+)$

Regex demo正则表达式演示

import re
s = "URS0000D94775_60169"
print(re.findall("URS[^_]*_(\d+)$", s))

Output Output

序列搜索期间的正则表达式错误

问题描述

2 个解决方案

解决方案1
1 2021-02-13 03:05:52

解决方案2
1 2021-02-14 14:42:26

序列搜索期间的正则表达式错误

问题描述

2 个解决方案

解决方案1 1 2021-02-13 03:05:52

解决方案2 1 2021-02-14 14:42:26

解决方案1
1 2021-02-13 03:05:52

解决方案2
1 2021-02-14 14:42:26