在python中搜索通配符的字符串并返回匹配位置

Question

I am currently dealing with a dataset that is composed of thousands of strings with identities and would like to search for the positions at which a wildcard motif (which is composed of an N followed by any letter besides P and then an S or a T) occurs within the stringusing the RegEx module and return a list of IDs pair with the positions at which the motif occurs.我目前正在处理一个由数千个具有身份的字符串组成的数据集，并想搜索通配符主题（由 N 后跟除 P 之外的任何字母组成，然后是 S 或 T）的位置使用 RegEx 模块在字符串中出现，并返回一个 ID 对列表，其中包含主题出现的位置。

import re
strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT'],...]
for x in strings:
    re.search('N[^P][ST]',x[1])

Which I would like to return:我想返回：

[['ID#1',[8,12,18]],['ID#2',[1,12,20]],.....]

If anyone has any ideas it would be very much appreciated, thanks!如果有人有任何想法，将不胜感激，谢谢！

Answer 1

You are most likely looking for this instead.您很可能正在寻找这个。

re.finditer(pattern, string[, flags])

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string.返回一个迭代器，在字符串中 RE 模式的所有非重叠匹配上产生 MatchObject 实例。 The string is scanned left-to-right, and matches are returned in the order found.从左到右扫描字符串，并按找到的顺序返回匹配项。 Empty matches are included in the result unless they touch the beginning of another match.空匹配项包含在结果中，除非它们触及另一个匹配项的开头。

This will work..这将工作..

import re

strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT']]
pattern = re.compile('N[^P][ST]')

print [[f[0], [m.start() + 1 for m in pattern.finditer(f[1])]] for f in strings]

or you could possibly try something like..或者你可以尝试像..

import re

strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT']]
pattern = re.compile('N[^P][ST]')

for x in strings:
    p = pattern.finditer(x[1])
    print [[x[0], [m.start() + 1 for m in p]]

Answer 2

I'm not very experienced in Python, but I think you can do something like this:我在 Python 方面不是很有经验，但我认为你可以做这样的事情：

import re
strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT']]
def findpos(s):
    return [s[0], [m.start() + 1 for m in re.finditer('N[^P][ST]',s[1])]]

return map(findpos, strings)
// [['ID#1', [8, 12, 18]], ['ID#2', [1, 12, 20]]]

or even more simply, just:或者更简单地说，只是：

[[s[0], [m.start() + 1 for m in re.finditer('N[^P][ST]',s[1])]] for s in strings]

在python中搜索通配符的字符串并返回匹配位置

问题描述

2 个解决方案

解决方案1
1 2013-08-16 21:27:12

解决方案2
0 2013-08-16 21:21:23

在python中搜索通配符的字符串并返回匹配位置

问题描述

2 个解决方案

解决方案1 1 2013-08-16 21:27:12

解决方案2 0 2013-08-16 21:21:23

解决方案1
1 2013-08-16 21:27:12

解决方案2
0 2013-08-16 21:21:23