简体   繁体   中英

Read Text File and Return Words as a Sorted List

For an assignment in Python 3, I need to create a program that will do the following:

  1. Open a text file chosen by the user
  2. Append all words within text file to a list
  3. Sort the words in the list
  4. Print the sorted list matching the desired results

The code I have will sort the list but will not dedup the list to the desired results. The text file is the first four lines of a soliloquy from Romeo and Juliet.

fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
    line = line.rstrip()
    words = line.split()
    for word in words:
        lst.append(word)
lst.sort()
print(lst)

The desired result is:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

But with my code, I get duplicated words:

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

How can I dedupe the list?

There are a few ways you can do this. You can check if the word is already in the list, and only append when the word is not in the list:

for word in words:
    if word not in lst:
        lst.append(word)
lst.sort()

If the word is already in the list, you don't do anything, so I think that is all you need.

You can also convert your list to a set (sets can only have a single instance of each unique value they contain). The kind of clunky thing about this is that you will then need to convert it back to a list to sort it (sets are unsorted by nature, although there are other libraries that give you sorted options), and to match the required output format (I assume they require a list output):

for word in words:
    lst.append(word)
lst = sorted(set(lst))  # convert to set and sort in one line. Returns a list.

I'd assume the first option seems more illustrative of what you are likely expected to be learning for this assignment.

Instead of list, use set to collect words. At the end, convert to list and sort

fname = input("Enter file name: ")
words = set()
with open(fname) as fh:
    for line in fh:
        line = line.rstrip()
        words.update(set(line.split()))

words_list = sorted(list(words))
print(words_list)

One possibility would be to use a set , maybe like this:

filename = input("Enter file name: ")
words = set()

with open(filename) as f:
    for line in f:
        line = line.strip()
        if len(line) > 0:
            for w in line.split()
                w = w.strip()
                if len(w) > 0:
                    words.add(w)

print(words)
sorted_words = list(sorted(words))
print(sorted_words)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM