简体   繁体   English

读取文本文件并将单词作为排序列表返回

[英]Read Text File and Return Words as a Sorted List

For an assignment in Python 3, I need to create a program that will do the following: 对于Python 3中的分配,我需要创建一个程序来执行以下操作:

  1. Open a text file chosen by the user 打开用户选择的文本文件
  2. Append all words within text file to a list 将文本文件中的所有单词附加到列表中
  3. Sort the words in the list 排序列表中的单词
  4. Print the sorted list matching the desired results 打印符合期望结果的排序列表

The code I have will sort the list but will not dedup the list to the desired results. 我拥有的代码将对列表进行排序,但不会将列表简化到所需的结果。 The text file is the first four lines of a soliloquy from Romeo and Juliet. 文本文件是Romeo和Juliet的自白的前四行。

fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
    line = line.rstrip()
    words = line.split()
    for word in words:
        lst.append(word)
lst.sort()
print(lst)

The desired result is: 理想的结果是:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

But with my code, I get duplicated words: 但是通过我的代码,我得到了重复的单词:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

How can I dedupe the list? 如何删除列表中的重复数据?

There are a few ways you can do this. 有几种方法可以做到这一点。 You can check if the word is already in the list, and only append when the word is not in the list: 您可以检查单词是否已经在列表中,并且仅在单词不在列表中时追加:

for word in words:
    if word not in lst:
        lst.append(word)
lst.sort()

If the word is already in the list, you don't do anything, so I think that is all you need. 如果单词已经在列表中,则您什么都不做,所以我认为这就是您所需要的。

You can also convert your list to a set (sets can only have a single instance of each unique value they contain). 您还可以将列表转换为集合(集合只能包含其包含的每个唯一值的单个实例)。 The kind of clunky thing about this is that you will then need to convert it back to a list to sort it (sets are unsorted by nature, although there are other libraries that give you sorted options), and to match the required output format (I assume they require a list output): 那种笨拙的事情是,然后您需要将其转换回列表以对其进行排序(尽管没有其他库为您提供排序选项,但集合本质上是未排序的),并与所需的输出格式匹配(我假设他们需要列表输出):

for word in words:
    lst.append(word)
lst = sorted(set(lst))  # convert to set and sort in one line. Returns a list.

I'd assume the first option seems more illustrative of what you are likely expected to be learning for this assignment. 我认为第一种选择似乎更能说明您可能期望从该作业中学到的知识。

Instead of list, use set to collect words. 代替列表,使用set收集单词。 At the end, convert to list and sort 最后,转换为列表并排序

fname = input("Enter file name: ")
words = set()
with open(fname) as fh:
    for line in fh:
        line = line.rstrip()
        words.update(set(line.split()))

words_list = sorted(list(words))
print(words_list)

One possibility would be to use a set , maybe like this: 一种可能是使用set ,也许像这样:

filename = input("Enter file name: ")
words = set()

with open(filename) as f:
    for line in f:
        line = line.strip()
        if len(line) > 0:
            for w in line.split()
                w = w.strip()
                if len(w) > 0:
                    words.add(w)

print(words)
sorted_words = list(sorted(words))
print(sorted_words)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM