使用 Python 中的 re.search(pattern, text) 在两个指定子字符串之间提取 substring

Question

我有一个字符串，例如"ENST00000260682_3_4_5_6_7_8_9_BS_673.6" 。 我必须在re.search()中使用正则表达式来提取 substring 并将其写入 Python 中这样的列表[3, 4, 5, 6, 7, 8, 9] 。

我试过了，

text="ENST00000260682_3_4_5_6_7_8_9_BS_673.6"
pattern=re.compile(r"^[[A-Z0-9]*_[.*]_BS]")
a=re.search(pattern, text)
print(a.group())

它返回'none'和AttributeError: 'NoneType' object has no attribute 'group' 。

请帮我解决一下这个。

Answer 1

搜索_BS之前下划线之后的所有数字：

import re
text="ENST00000260682_3_4_5_6_7_8_9_BS_673.6"
pattern=re.compile(r"_(\d+)")
a=re.findall(pattern, text[:text.find('_BS')])
print(a)

Output： ['3', '4', '5', '6', '7', '8', '9']

或者，如果需要，将它们转换为 int：

a=[int(x) for x in re.findall(pattern, text[:text.find('_BS')])]

Answer 2

您可以使用生成器而不是正则表达式轻松实现此目的：

def num_gen(s, delimiter='_', start_index=1, stop_token='BS'):
    # delimiter: the char you want to split your text for
    # start_index: where your want to start retrieving values
    # stop_token: stop retrieving when the token is encountered

    for x in s.split(delimiter)[start_index:]:
        if x != stop_token:
            yield x
        else:
            return

用法：

t = "ENST00000260682_3_4_5_6_7_8_9_BS_673.6"
list(num_gen(t))

# ['3', '4', '5', '6', '7', '8', '9']

如果可能的话，除非必要，否则我建议避免使用正则表达式，尤其是在您不熟悉它的情况下。 这是一个相关的报价：

有些人在遇到问题时会想“我知道，我会使用正则表达式”。
现在他们有两个问题。

正则表达式何时有用是有时间和空间的。 但在那之前，不要将它不必要地添加到您的问题中。

使用 Python 中的 re.search(pattern, text) 在两个指定子字符串之间提取 substring

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-06-03 17:29:38

解决方案2
1 2020-06-03 17:37:19

使用 Python 中的 re.search(pattern, text) 在两个指定子字符串之间提取 substring

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-06-03 17:29:38

解决方案2 1 2020-06-03 17:37:19

解决方案1
1 已采纳 2020-06-03 17:29:38

解决方案2
1 2020-06-03 17:37:19