如何通过选择元素中的唯一字符组合来过滤列表（Python）？

Question

I have the the following pairs stored in the following list我将以下对存储在以下列表中

 sample = [[CGCG,ATAT],[CGCG,CATC],[ATAT,TATA]]

Each pairwise comparison can have only two unique combinations of characters, if not then those pairwise comparisons are eliminated.每个成对比较只能有两个唯一的字符组合，如果不是，则消除这些成对比较。 eg,例如，

   In sample[1]
    C       C
    G       A
    C       T 
    G       C

Look a the corresponding elements in both sub-lists, CC, GA, CT, GC.查看两个子列表中的相应元素，CC、GA、CT、GC。

Here, there are more than two types of pairs (CC), (GA), (CT) and (GC).这里有两种以上的对（CC）、（GA）、（CT）和（GC）。 So this pairwise comparison cannot occur.所以这种成对比较不可能发生。

Every comparison can have only 2 combinations out of (AA, GG,CC,TT, AT,TA,AC,CA,AG,GA,GC,CG,GT,TG,CT,TC) ... basically all possible combinations of ACGT where order matters.每个比较只能有 (AA, GG,CC,TT, AT,TA,AC,CA,AG,GA,GC,CG,GT,TG,CT,TC) 中的 2 种组合......基本上所有可能的组合ACGT 顺序很重要。

In the above example, more than 2 such combinations are found.在上面的例子中，发现了超过 2 个这样的组合。

However,然而，

   In sample[0]
    C       A
    G       T
    C       A 
    G       T

There are only 2 unique combinations: CA and GT只有两种独特的组合：CA 和 GT

Thus, the only pairs, that remain are:因此，剩下的唯一对是：

output = [[CGCG,ATAT],[ATAT,TATA]]

I would prefer if the code was in traditional for-loop format and not comprehensions我更喜欢代码是传统的 for 循环格式而不是理解

This is a small part of the question listed here .这是这里列出的问题的一小部分。 This portion of the question is re-asked, as the answer provided earlier provided incorrect output.问题的这一部分被重新提问，因为之前提供的答案提供了不正确的输出。

Answer 1

def filter_sample(sample):
    filtered_sample = []

    for s1, s2 in sample:
        pairs = {pair for pair in zip(s1, s2)}
        if len(pairs) <= 2:
            filtered_sample.append([s1, s2])

    return filtered_sample

Running this运行这个

sample = [["CGCG","ATAT"],["CGCG","CATC"],["ATAT","TATA"]]
filter_sample(sample)

Returns this返回这个

[['CGCG', 'ATAT'], ['ATAT', 'TATA']]

Answer 2

The core of this task is extracting the pairs from your sublists and counting the number of unique pairs.此任务的核心是从您的子列表中提取对并计算唯一对的数量。 Assuming your samples actually contain strings, you can use zip(*sub_list) to get the pairs.假设您的样本实际上包含字符串，您可以使用zip(*sub_list)来获取对。 Then you can use set() to remove duplicate entries.然后您可以使用set()删除重复条目。

sample = [['CGCG','ATAT'],['CGCG','CATC'],['ATAT','CATC']]

def filter(sub_list, n_pairs):
    pairs = zip(*sub_list)
    return len(set(pairs)) == n_pairs

Then you can use a for loop or a list comprehension to apply this function to your main list.然后您可以使用 for 循环或列表推导将此函数应用于您的主列表。

new_sample = [sub_list for sub_list in sample if filter(sub_list, 2)]

...or as a for loop... ...或作为for循环...

new_sample = []
for sub_list in sample:
    if filter(sub_list, 2):
        new_sample.append(sub_list)

Answer 3

sample = [[CGCG,ATAT],[CGCG,CATC],[ATAT,CATC]]
result = []
for s in sample:
    first = s[0]
    second = s[1]
    combinations = []
    for i in range(0,len(first)):
        comb = [first[i],second[i]]
        if comb not in combinations:
            combinations.append(comb)
    if len(combinations) == 2:
        result.append(s)

print result

如何通过选择元素中的唯一字符组合来过滤列表（Python）？

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-10-16 20:19:31

解决方案2
1 2016-10-16 20:19:01

解决方案3
1 2016-10-16 20:19:27

如何通过选择元素中的唯一字符组合来过滤列表（Python）？

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-10-16 20:19:31

解决方案2 1 2016-10-16 20:19:01

解决方案3 1 2016-10-16 20:19:27

解决方案1
2 已采纳 2016-10-16 20:19:31

解决方案2
1 2016-10-16 20:19:01

解决方案3
1 2016-10-16 20:19:27