简体   繁体   English

将列表中的一个元素与另一个列表的所有元素进行比较

[英]Comparing one element from a list to ALL elements of another list

I have a list that contains various sequences of letters.我有一个包含各种字母序列的列表。

sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

I want to see if the last 3 letter of each sequence in that list matches the first 3 letters of all the other sequences.我想看看该列表中每个序列的最后 3 个字母是否与所有其他序列的前 3 个字母匹配。 If that happens, I want to know the indexes of these two sequences.如果发生这种情况,我想知道这两个序列的索引。

I'm basically trying to produce an adjacency list.我基本上是在尝试生成一个邻接列表。 Below is an example of an input:下面是一个输入示例:

>Sample_0
AAGTAAA
>Sample_1
AAATGAT
>Sample_2
AAAGTTT
>Sample_3
TTTTCCC
>Sample_4
AATTCGC
>Sample_5
CGCTCCC

And the output:和 output:

>Sample_0 >Sample_1
>Sample_0 >Sample_2
>Sample_2 >Sample_3
>Sample_4 >Sample_5

Now, I tried to make two different lists that contain all the prefixes and all the suffixes but I don't know if this can help and how to use this to solve my problem.现在,我尝试制作两个不同的列表,其中包含所有前缀和所有后缀,但我不知道这是否有帮助以及如何使用它来解决我的问题。

file = open("rosalind_grph2.txt", "r")

gene_names, sequences, = [], []
seq = ""

for line in file:
    if line[0] == ">":
        gene_names.append(line.strip())
        if seq == "":
            continue
        sequences.append(seq)
        seq = ""
    if line[0] in "ATCG":
        seq = seq + line.strip()
sequences.append(seq)

#So far I put all I needed into a list

prefix = [i[0:3] for i in sequences]
suffix = [i[len(i)-3:] for i in sequences]

#Now, all suffixes and prefixes are in lists as well
#but what now?  

print(suffix)
print(prefix)
print(sequences)
file.close

If I am understanding your problem correctly, this code enumerates over the list twice.如果我正确理解了您的问题,则此代码将在列表中枚举两次。 It is comparing the last 3 letters of first element with the first 3 letters of the second element and prints the indices of the elements if there is a match.它将第一个元素的最后 3 个字母与第二个元素的前 3 个字母进行比较,如果匹配,则打印元素的索引。 Please give feedback/clarify if this is not what you are looking for.如果这不是您想要的,请提供反馈/澄清。 This is O(n^2) and can likely be sped up if you take a initial pass and store indices in a structure like a dictionary.这是 O(n^2) 并且如果您进行初始传递并将索引存储在像字典这样的结构中,则可能会加快速度。


for index1, sequence1 in enumerate(sequences):
    for index2, sequence2 in enumerate(sequences):
        if index1 != index2:
            if sequence1[-3:] == sequence2[0:3]:
                print(sequence1[-3:], index1, sequence2[0:3], index2)

If I understood correctly, what you would like to do is to connect different element of sequences , where the connection is that beginning of the string matches the end of the other string.如果我理解正确,您想做的是连接sequences的不同元素,其中连接是字符串的开头与另一个字符串的结尾匹配。

One way of doing this, using a dict is by using the following function match_head_tail() :使用dict的一种方法是使用以下 function match_head_tail()

def match_head_tail(items, length=3):
    result = {}
    for x in items:
        v = [y for y in items if y[:length] == x[-length:]]
        if v:
            result[x] = v
    return result
sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

print(match_head_tail(sequences))
# {'AAGTAAA': ['AAATGAT', 'AAAGTTT'], 'AAAGTTT': ['TTTTCCC'], 'AATTCGC': ['CGCTCCC']}

If you want to include also sequences for which there is no match you could use the following function match_head_tail_all() :如果您还想包含不匹配的序列,您可以使用以下 function match_head_tail_all()

def match_head_tail_all( items, length=3):
    return {x: [y for y in items if y[:length] == x[-length:]] for x in items}
sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

print(match_head_tail_all(sequences))
# {'AAGTAAA': ['AAATGAT', 'AAAGTTT'], 'AAATGAT': [], 'AAAGTTT': ['TTTTCCC'], 'TTTTCCC': [], 'AATTCGC': ['CGCTCCC'], 'CGCTCCC': []}

EDIT 1编辑 1

If you actually want indexes, please combine the above with enumerate() to get them, eg:如果你真的想要索引,请结合上面的enumerate()来获取它们,例如:

def match_head_tail_all_indexes( items, length=3):
    return {
        i: [j for j, y in enumerate(items) if y[:length] == x[-length:]]
        for i, x in enumerate(items)}


sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

print(match_head_tail_all_indexes(sequences))
# {0: [1, 2], 1: [], 2: [3], 3: [], 4: [5], 5: []}

EDIT 2编辑 2

If your input contains many sequences with the same ending, you may want to consider implementing some caching mechanism for improved computational efficiency (at the expenses of memory efficiency), eg:如果您的输入包含许多具有相同结尾的序列,您可能需要考虑实现一些缓存机制以提高计算效率(以 memory 效率为代价),例如:

def match_head_tail_cached(items, length=3, caching=True):
    result = {}
    if caching:
        cached = {}
    for x in items:
        if caching and x[-length:] in cached:
            v = cached[x[-length:]]
        else:
            v = [y for y in items if y[:length] == x[-length:]]    
        if v:
            result[x] = v
    return result


sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

print(match_head_tail_cached(sequences))
# {'AAGTAAA': ['AAATGAT', 'AAAGTTT'], 'AAAGTTT': ['TTTTCCC'], 'AATTCGC': ['CGCTCCC']}

EDIT 3编辑 3

All this could also implemented with list only, eg:所有这些也可以仅使用list来实现,例如:

def match_head_tail_list(items, length=3):
    result = []
    for x in items:
        v = [y for y in items if y[:length] == x[-length:]]
        if v:
            result.append([x, v])
    return result


sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

print(match_head_tail_list(sequences))
# [['AAGTAAA', ['AAATGAT', 'AAAGTTT']], ['AAAGTTT', ['TTTTCCC']], ['AATTCGC', ['CGCTCCC']]]

and even have less nesting:甚至更少的嵌套:

def match_head_tail_flat(items, length=3):
    result = []
    for x in items:
        for y in items:
            if y[:length] == x[-length:]:
                result.append([x, y])
    return result


sequences = ['AAGTAAA', 'AAATGAT', 'AAAGTTT', 'TTTTCCC', 'AATTCGC', 'CGCTCCC']

print(match_head_tail_flat(sequences))
# [['AAGTAAA', 'AAATGAT'], ['AAGTAAA', 'AAAGTTT'], ['AAAGTTT', 'TTTTCCC'], ['AATTCGC', 'CGCTCCC']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从另一个列表中的一个元素中减去列表中的所有元素? - How to subtract all elements in list from one element in another list? 用另一个列表中的所有元素替换一个列表中的元素 - Replacing an element from one list with all elements in another list 检查一个列表的每个元素是否是另一个列表的所有元素的倍数 - Check if each element of one list is a multiple of all the elements of another list 将列表的一个元素放在另一个列表的所有元素之间 - put one element of a list between all elements of another list 将一个列表的每个字符串元素连接到另一个列表的所有元素 - Concatenate each string element of one list to all elements of another list 将一个列表中的所有元素与另一个列表保持一致 - Keep all elements in one list from another Python:如何用另一个列表中的所有元素替换列表中的元素? - Python: how to replace an element in a list with all elements from another list? 将列表中的每个元素与另一个列表中的 2 个元素进行比较并使代码高效 - comparing each element in list to 2 elements in another list and make code efficient 与另一个列表进行比较,确定列表中所有元素的索引 - Identify index of all elements in a list comparing with another list 用==将一个列表与另一个列表进行比较 - Comparing one list to another with ==
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM