[英]Search Strings for Wildcard in python and return position of match
I am currently dealing with a dataset that is composed of thousands of strings with identities and would like to search for the positions at which a wildcard motif (which is composed of an N followed by any letter besides P and then an S or a T) occurs within the stringusing the RegEx module and return a list of IDs pair with the positions at which the motif occurs.我目前正在处理一个由数千个具有身份的字符串组成的数据集,并想搜索通配符主题(由 N 后跟除 P 之外的任何字母组成,然后是 S 或 T)的位置使用 RegEx 模块在字符串中出现,并返回一个 ID 对列表,其中包含主题出现的位置。
import re
strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT'],...]
for x in strings:
re.search('N[^P][ST]',x[1])
Which I would like to return:我想返回:
[['ID#1',[8,12,18]],['ID#2',[1,12,20]],.....]
If anyone has any ideas it would be very much appreciated, thanks!如果有人有任何想法,将不胜感激,谢谢!
You are most likely looking for this instead.您很可能正在寻找这个。
re.finditer(pattern, string[, flags])
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string.返回一个迭代器,在字符串中 RE 模式的所有非重叠匹配上产生 MatchObject 实例。 The string is scanned left-to-right, and matches are returned in the order found.
从左到右扫描字符串,并按找到的顺序返回匹配项。 Empty matches are included in the result unless they touch the beginning of another match.
空匹配项包含在结果中,除非它们触及另一个匹配项的开头。
This will work..这将工作..
import re
strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT']]
pattern = re.compile('N[^P][ST]')
print [[f[0], [m.start() + 1 for m in pattern.finditer(f[1])]] for f in strings]
or you could possibly try something like..或者你可以尝试像..
import re
strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT']]
pattern = re.compile('N[^P][ST]')
for x in strings:
p = pattern.finditer(x[1])
print [[x[0], [m.start() + 1 for m in p]]
I'm not very experienced in Python, but I think you can do something like this:我在 Python 方面不是很有经验,但我认为你可以做这样的事情:
import re
strings = [['ID#1','NTGSLTKNASMNLTQRSNQT'],['ID#2','NLSHTNWEUWBNTTDKWODNUT']]
def findpos(s):
return [s[0], [m.start() + 1 for m in re.finditer('N[^P][ST]',s[1])]]
return map(findpos, strings)
// [['ID#1', [8, 12, 18]], ['ID#2', [1, 12, 20]]]
or even more simply, just:或者更简单地说,只是:
[[s[0], [m.start() + 1 for m in re.finditer('N[^P][ST]',s[1])]] for s in strings]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.