简体   繁体   中英

The nature of lists in python, why do I get a repeating list?

So, I was writing a program for an assignment on Coursera, I solved it but I got some unintended behavior. The following code with the input of romeo.txt:

fname = input("Enter file name: ")
fh = open(fname, 'r')
lst = list()
words = ''
fin = list()
for line in fh:
    words += line.strip(' ')

words = words.replace('\n', ' ')

for line in words:
    lst += words.split(' ')
print(lst)

Instead of giving me a list of words only appearing once, it gives me every word, but repeated an unknown number of times.

Gives me a huge list
of repeating words: ['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun', 'Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon', 'Who', 'is', 'already', 'sick', 'and', 'pale', 'with', 'grief', 'But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun', 'Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon', 'Who', 'is', 'already', 'sick', 'and', 'pale', 'with', 'grief', 'But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun', 'Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon', 'Who', 'is', 'already', 'sick', 'and', 'pale', 'with', 'grief', 'But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks', 'It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun', 'Arise', 'fair', 'sun' . . . ., 

The words repeat SO much more than that.

Initially you said:

words = ''

Ok. So words is a string. Then, you said:

for line in fh:
    words += line.strip(' ')

For every line in the file, strip spaces from the current line and append it to words . Each iteration you are appending to your words string. When the loop is done, words will be one giant string.

Then, you said:

words = words.replace('\n', ' ')

Ok. words is still a string. All you've done is replace all newline characters with spaces.

Then, you said:

for line in words:
    lst += words.split(' ')

line in this case, is not a good name for this temporary variable, since you are not iterating over the lines anymore. Your iterable is words , which is a string. When you iterate over a string, you get individual characters, not lines:

>>> for line in "abcdefg":
    print(line)


a
b
c
d
e
f
g
>>> 

Just because I'm calling the temporary variable line , doesn't mean that that's what it actually is. I could have called it anything and I still would have received the same output. A better name for this variable, therefore, would be char , for example.

Back to your snippet, since you are iterating over the characters in your words string, you are extending your list with the result of words.split(' ') , once for every character. I don't need to see your input file to know that that's a gigantic list. The number of strings in your lst list will be approximately equal to the number of words in the file multiplied by the number of characters in the file.

Python lists are not expected to be unique. They preserve the order in which things were inserted. If you want the unique set of words, use Python set . You can create a set by passing a list to it, as in changing your last line to

print(set(lst))

or you can create an empty set and then add words to it as you come across them, something like this:

s = set()
...
for... :
  s.update(words.split(' '))

Not sure what the actual question is, but if you want to have something like a list that doesn't allow duplicates, the datatype you want is a set. Sets don't allow duplicates so if you try to add a string to a set that's already there it will just skip it. Try initializing sets instead of lists Just a heads up as well you can initialize blank lists like this:

lst = []

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM