[英]Python3: Find length between two substrings of a string
I am having two small sequences, which I search in a "long string". 我有两个小序列,它们以“长字符串”搜索。 If both sequences are found, the key of the "long string" is appended to a list (the string I search IN is a dictionary value).
如果找到两个序列,则将“长字符串”的关键字附加到列表中(我搜索的字符串是字典值)。
Now I am looking for a way, to acquire/calculate the distance between the two substrings (if they were found). 现在,我正在寻找一种方法来获取/计算两个子字符串之间的距离(如果找到)。
So, for example: 因此,例如:
String: ABCDEFGHIJKL
sequence1: ABC
sequence2: JKL
I want to get the length of DEFGHI, which would be 6. 我想获取DEFGHI的长度,即6。
Here is my code for finding the substrings, with some "pseudo-codish" idea of what I want (variables start and end). 这是我的代码,用于查找子字符串,并对想要的内容(变量开始和结束)有一些“伪编码”的想法。 This code does not work (ofc)
此代码不起作用(ofc)
def search (myDict, list1, list2):
# initialize empty list to store found keys
a=[]
# iterating through dictionary
for key, value in myDict.items():
# if -35nt motif is found between -40 and -20
for item in thirtyFive:
if item in value[60:80]:
start=myDict[:item]
# it is checked for the -10nt motif from -40 to end
for item in ten:
if item in value[80:]:
end=myDict[:item]
# if both conditions are true, the IDs are
# appended to the list
a.append(key)
distance=start-end
return a, distance
Second Idea: So far, I found some stuff on how getting the string between two substrings. 第二个想法:到目前为止,我发现了一些有关如何在两个子字符串之间获取字符串的内容。 So, the next thing I could imagine is, to get the sequence and do sth like len(sequence).
因此,我可以想象的下一件事是获取序列并执行类似len(sequence)的操作。
So, I would like to know, if my first idea, to somehow do it while I am finding the small sequences, is somehow possible and, if I am thinking in the right direction with my second idea. 因此,我想知道,如果我的第一个主意是在找到小序列时以某种方式进行操作,是否有可能,并且如果我以第二个主意朝着正确的方向思考。
Thanks in advance :) 提前致谢 :)
def search (myDict, list1, list2):
# initialize empty list to store found keys
a=[]
# iterating through dictionary
for key, value in myDict.items():
# if -35nt motif is found between -40 and -20
for item in thirtyFive:
if item in value[60:80]:
start=value.find(item)
# it is checked for the -10nt motif from -20 to end
for item in ten:
if item in value[80:]:
end=value.find(item)
# if both conditions are true, the IDs are
# appended to the list
a.append(key)
search.distance=end-start-len(item)
return a
# calling search function
x=search(d,thirtyFive,ten)
#some other things I need to print
y=len(x)
print(str(x))
print(y)
# desired output
print(search.distance)
Check this 检查一下
In [1]: a='ABCDEFGHIJKL'
In [2]: b='ABC'
In [3]: c='JKL'
In [4]: a.find(b)
Out[4]: 0
In [6]: a.find(c)
Out[6]: 9
In [7]: l=a.find(b) + len(b)
In [8]: l
Out[8]: 3
In [10]: a[l:a.find(c)]
Out[10]: 'DEFGHI'
In [11]:
You can also do it using regex : 您也可以使用regex来做到这一点:
import re
s = "ABCDEFGHIJKL"
seq1 = "ABC"
seq2 = "JKL"
s1 = re.match(seq1 + "(.*)" + seq2, s).group(1)
print s1
print(len(s1))
Output 输出量
DEFGHI
6
OR 要么
Using str.replace
: 使用
str.replace
:
s2 = s.replace(seq1, '').replace(seq2, '')
print s2
print(len(s2))
Output 输出量
DEFGHI
6
Use str.find() to get two indices, and adjust for the length of the first one. 使用str.find()获得两个索引,并调整第一个索引的长度。
Also don't forget corner cases, eg where the substrings overlap. 也不要忘记在极端情况下,例如子字符串重叠。
Solution using regular expressions: 使用正则表达式的解决方案:
import re
string = "ABCDEFGHIJKL"
sequence1 = "ABC"
sequence2 = "JKL"
result = re.search(sequence1+'(.*)'+sequence2,string)
print(len(result.group(1)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.