简体   繁体   English

使列表唯一(python)

[英]Making a list unique (python)

I have a matrix of around 3000 species classifications eg 我有大约3000个物种分类的矩阵,例如

Arthropoda/Hexapoda/Insecta/Coleoptera/Cerambycidae/Anaglyptus 节肢动物门/六足动物/昆虫/鞘翅目/天牛科/ Anaglyptus

each line is a sequence of taxonomic classifications. 每行是分类学分类的序列。 What I need to do is, sort the 3000 lines so each one is unique so that the file can be fed to a program that creates phylogenetic(evolutionary) trees. 我需要做的是,对3000行进行排序,以便每一行都是唯一的,以便可以将文件输入到创建系统树(进化树)的程序中。

I have tried to use a set but get an error as lists are not hashable objects, however it is important to keep each line together as the values in each column for each line are nested. 我尝试使用一个集合,但是由于列表不是可哈希对象,因此出现错误,但是将每一行保持在一起很重要,因为每一行的每一列中的值都是嵌套的。

Whats the best way to ensure I only have unique values in the last column but keep the integrity of each row? 确保我仅在最后一列中具有唯一值但保持每一行的完整性的最佳方法是什么?

many thanks 非常感谢

As mentioned in the comments, tuples are hashable, even though lists aren't. 如注释中所述,元组是可哈希的,即使列表不是。 So let's convert your rows to tuples! 因此,让我们将您的行转换为元组!

# Create the Dataset
L = []
L.append(["Arthropoda", "Hexapoda", "Insecta", "Coleoptera", "Cerambycidae", "Anaglyptus"])
L.append(["Arthropoda", "Hexapoda", "Insecta", "Coleoptera", "Cerambycidae", "Aromia"])

# Instead of a list of lists, let's have a list of tuples!
L = [tuple(x) for x in L]

# Using a set, we can easily remove duplicates
L = set(L)

The python masters may be offended but this answer is worth a try 可能会冒犯python大师,但这个答案值得一试

l = []
with open('file.txt', 'r') as fp:
    for i in fp.readlines():
        if i not in l:
            l.append(i)

with open('file2.txt', 'w') as fp:
    fp.writelines(l)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM