繁体   English   中英

使用 Python 从 txt 文件中读取并列出唯一单词

[英]Read and list unique words from a txt file using Python

所以我的任务是:

Write a program to list which letters in the file seqs.txt are not A, T, C, or G. It should only list
each letter once. Hint: Start with an empty list for unknown letters. Then use two loops to scan
letters in each sequences.

目前,我一直坚持知道如何实现两个循环来扫描字母。

def main():
    with open('seqs.txt','r') as seqs_file:
        unknown = ("A","T","G","C")
        #unknown_list = ("B","D","E","F","H","I","J","K","L","M","N","O","P"
                         # ,"Q","R","S","U","V","X","Y","Z")
        for unknown in seqs_file:
            if True:
                return()
            else:
                print(#the other letters not ATCG#)
main()

这是我知道的代码。 我曾尝试使用.read()命令,但之后我不确定如何制作这两个循环。 将不胜感激任何帮助让我走上正轨!

编辑:文本文件包含以下内容:

举个例子。

如果你想要唯一性,使用集合比使用列表更有意义......

known = {"A", "T", "G", "C"}
unknown = set()
with open('seqs.txt','r') as seqs_file:
    for letter in seqs_file.read():
        unknown.add(letter)
unknown -= known
for letter in unknown:
    print(letter)

我会使用它并忽略那些会让你完全偏离正轨的愚蠢提示:

import string

with open('seqs.txt') as fin :
    characters = [i.upper() for i in fin.read() if i in string.letters]

result = set(characters) - set(['A', 'T', 'C', 'G'])

print sorted(result)
def main():
    #y.txt contains this space separated string 'A B C D E F G H I J K L M N O P Q R S T U V X Y Z'
    with open('some_file.txt','r') as seqs_file:
        data = seqs_file.read().split()
        other = []
        unknown = ("A","T","G","C")
        for d in data:
            if d in unknown:
                pass
            elif d not in other:
                print(d)
                other.append(d)
main()

据我所知,解决您的问题的最简单方法是读取文件,将其拆分并列出。 要获得独特的元素,请将其转换为 set 和简单的 for 循环将为您提供解决方案。

f = open("seq.txt", 'r')
unknown_letters = list(f.read())
known_letters = ['A', 'T', 'G', 'C']
unknown_unique_letters_set = set(unknown_letters)

for i in unknown_unique_letters_set:
    if i in known_letters:
        pass
    else:
        print(i)

如果您不想使用 set() 并想坚持下面的列表是代码:

f = open("seq.txt", 'r')
unknown_letters = list(f.read())
known_letters = ['A', 'T', 'G', 'C']
visited_letters = []
for i in unknown_letters:
    if i in known_letters:
        pass
    elif i in visited_letters:
        pass
    else:
        visited_letters.append(i)

print(visited_letters)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM