读取文本文件并将单词作为排序列表返回

Question

For an assignment in Python 3, I need to create a program that will do the following: 对于Python 3中的分配，我需要创建一个程序来执行以下操作：

Open a text file chosen by the user 打开用户选择的文本文件
Append all words within text file to a list 将文本文件中的所有单词附加到列表中
Sort the words in the list 排序列表中的单词
Print the sorted list matching the desired results 打印符合期望结果的排序列表

The code I have will sort the list but will not dedup the list to the desired results. 我拥有的代码将对列表进行排序，但不会将列表简化到所需的结果。 The text file is the first four lines of a soliloquy from Romeo and Juliet. 文本文件是Romeo和Juliet的自白的前四行。

fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
    line = line.rstrip()
    words = line.split()
    for word in words:
        lst.append(word)
lst.sort()
print(lst)

The desired result is: 理想的结果是：

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

But with my code, I get duplicated words: 但是通过我的代码，我得到了重复的单词：

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

How can I dedupe the list? 如何删除列表中的重复数据？

Answer 1

There are a few ways you can do this. 有几种方法可以做到这一点。 You can check if the word is already in the list, and only append when the word is not in the list: 您可以检查单词是否已经在列表中，并且仅在单词不在列表中时追加：

for word in words:
    if word not in lst:
        lst.append(word)
lst.sort()

If the word is already in the list, you don't do anything, so I think that is all you need. 如果单词已经在列表中，则您什么都不做，所以我认为这就是您所需要的。

You can also convert your list to a set (sets can only have a single instance of each unique value they contain). 您还可以将列表转换为集合（集合只能包含其包含的每个唯一值的单个实例）。 The kind of clunky thing about this is that you will then need to convert it back to a list to sort it (sets are unsorted by nature, although there are other libraries that give you sorted options), and to match the required output format (I assume they require a list output): 那种笨拙的事情是，然后您需要将其转换回列表以对其进行排序（尽管没有其他库为您提供排序选项，但集合本质上是未排序的），并与所需的输出格式匹配（我假设他们需要列表输出）：

for word in words:
    lst.append(word)
lst = sorted(set(lst))  # convert to set and sort in one line. Returns a list.

I'd assume the first option seems more illustrative of what you are likely expected to be learning for this assignment. 我认为第一种选择似乎更能说明您可能期望从该作业中学到的知识。

Answer 2

Instead of list, use set to collect words. 代替列表，使用set收集单词。 At the end, convert to list and sort 最后，转换为列表并排序

fname = input("Enter file name: ")
words = set()
with open(fname) as fh:
    for line in fh:
        line = line.rstrip()
        words.update(set(line.split()))

words_list = sorted(list(words))
print(words_list)

Answer 3

One possibility would be to use a set , maybe like this: 一种可能是使用set ，也许像这样：

filename = input("Enter file name: ")
words = set()

with open(filename) as f:
    for line in f:
        line = line.strip()
        if len(line) > 0:
            for w in line.split()
                w = w.strip()
                if len(w) > 0:
                    words.add(w)

print(words)
sorted_words = list(sorted(words))
print(sorted_words)

读取文本文件并将单词作为排序列表返回

问题描述

3 个解决方案

解决方案1
2 已采纳 2018-12-18 22:08:35

解决方案2
2 2018-12-18 22:09:20

解决方案3
0 2018-12-18 22:11:19

读取文本文件并将单词作为排序列表返回

问题描述

3 个解决方案

解决方案1 2 已采纳 2018-12-18 22:08:35

解决方案2 2 2018-12-18 22:09:20

解决方案3 0 2018-12-18 22:11:19

解决方案1
2 已采纳 2018-12-18 22:08:35

解决方案2
2 2018-12-18 22:09:20

解决方案3
0 2018-12-18 22:11:19