简体   繁体   English

Python3:查找字符串的两个子字符串之间的长度

[英]Python3: Find length between two substrings of a string

I am having two small sequences, which I search in a "long string". 我有两个小序列,它们以“长字符串”搜索。 If both sequences are found, the key of the "long string" is appended to a list (the string I search IN is a dictionary value). 如果找到两个序列,则将“长字符串”的关键字附加到列表中(我搜索的字符串是字典值)。

Now I am looking for a way, to acquire/calculate the distance between the two substrings (if they were found). 现在,我正在寻找一种方法来获取/计算两个子字符串之间的距离(如果找到)。

So, for example: 因此,例如:

String: ABCDEFGHIJKL
sequence1: ABC
sequence2: JKL

I want to get the length of DEFGHI, which would be 6. 我想获取DEFGHI的长度,即6。

Here is my code for finding the substrings, with some "pseudo-codish" idea of what I want (variables start and end). 这是我的代码,用于查找子字符串,并对想要的内容(变量开始和结束)有一些“伪编码”的想法。 This code does not work (ofc) 此代码不起作用(ofc)

def search (myDict, list1, list2):
    # initialize empty list to store found keys
    a=[]
    # iterating through dictionary
    for key, value in myDict.items():
        # if -35nt motif is found between -40 and -20
        for item in thirtyFive:
            if item in value[60:80]:
                start=myDict[:item]
            # it is checked for the -10nt motif from -40 to end
                for item in ten:
                    if item in value[80:]:
                        end=myDict[:item]
                # if both conditions are true, the IDs are
                # appended to the list
                        a.append(key)
    distance=start-end
    return a, distance

Second Idea: So far, I found some stuff on how getting the string between two substrings. 第二个想法:到目前为止,我发现了一些有关如何在两个子字符串之间获取字符串的内容。 So, the next thing I could imagine is, to get the sequence and do sth like len(sequence). 因此,我可以想象的下一件事是获取序列并执行类似len(sequence)的操作。

So, I would like to know, if my first idea, to somehow do it while I am finding the small sequences, is somehow possible and, if I am thinking in the right direction with my second idea. 因此,我想知道,如果我的第一个主意是在找到小序列时以某种方式进行操作,是否有可能,并且如果我以第二个主意朝着正确的方向思考。

Thanks in advance :) 提前致谢 :)

SOLUTION following @Carlos using str.find method 使用str.find方法在@Carlos之后的解决方案

def search (myDict, list1, list2):
    # initialize empty list to store found keys
    a=[]
    # iterating through dictionary
    for key, value in myDict.items():
        # if -35nt motif is found between -40 and -20
        for item in thirtyFive:
            if item in value[60:80]:
                start=value.find(item)
            # it is checked for the -10nt motif from -20 to end
                for item in ten:
                    if item in value[80:]:
                        end=value.find(item)
                # if both conditions are true, the IDs are
                # appended to the list
                        a.append(key)
                        search.distance=end-start-len(item)

    return a

# calling search function
x=search(d,thirtyFive,ten)
#some other things I need to print
y=len(x)
print(str(x))
print(y)
# desired output
print(search.distance)

Check this 检查一下

In [1]: a='ABCDEFGHIJKL'

In [2]: b='ABC'

In [3]: c='JKL'

In [4]: a.find(b)
Out[4]: 0

In [6]: a.find(c)
Out[6]: 9

In [7]: l=a.find(b) + len(b)

In [8]: l
Out[8]: 3

In [10]: a[l:a.find(c)]
Out[10]: 'DEFGHI'

In [11]: 

You can also do it using regex : 您也可以使用regex来做到这一点:

import re
s = "ABCDEFGHIJKL"
seq1 = "ABC"
seq2 = "JKL"

s1 = re.match(seq1 + "(.*)" + seq2, s).group(1)
print s1
print(len(s1))

Output 输出量

DEFGHI
6

OR 要么

Using str.replace : 使用str.replace

s2 = s.replace(seq1, '').replace(seq2, '')
print s2
print(len(s2))

Output 输出量

DEFGHI
6

Live demo here 现场演示在这里

Use str.find() to get two indices, and adjust for the length of the first one. 使用str.find()获得两个索引,并调整第一个索引的长度。

Also don't forget corner cases, eg where the substrings overlap. 也不要忘记在极端情况下,例如子字符串重叠。

Solution using regular expressions: 使用正则表达式的解决方案:

import re

string = "ABCDEFGHIJKL"
sequence1 = "ABC"
sequence2 = "JKL"

result = re.search(sequence1+'(.*)'+sequence2,string)
print(len(result.group(1)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python正则表达式查找两个子字符串之间的所有字符串 - Python Regex Find All String Between Two Substrings 在数据的 stream 中查找两个子字符串之间的字符串 - Find string between two substrings, in a stream of data Python3:基于子字符串匹配两个列表之间的元素 - Python3: Matching elements between two lists based upon substrings python3:正则表达式,查找所有以某些字符串开头和结尾的子字符串 - python3: regex, find all substrings that starts with and end with certain string 如何在python字符串中的子字符串之间找到子字符串? - How to find substrings between substrings within a python string? Python-查找可变数目的子字符串之间的字符串 - Python - Find string between variable number of substrings Python正则表达式在两个子字符串之间获取字符串 - Python Regex Get String Between Two Substrings 查找两个子字符串之间的字符串,以及字符串和文件末尾之间的字符串 - Find string between two substrings AND between string and the end of file 如何在 python 的示例字符串中找到长度为 k 的所有重叠子字符串 - how to find all overlapping substrings of length k in a sample string in python 在一个大字符串中查找多次出现的不同URL,其中每个URL使用Python在两个特定的子字符串之间 - Find multiple occurrences of different URLs in a big string, where each URL is between two specific substrings using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM