简体   繁体   中英

Python: Issue with Writing over Lines?

So, this is the code I'm using in Python to remove lines, hence the name "cleanse." I have a list of a few thousand words and their parts-of-speech:

NN by

PP at

PP at

... This is the issue. For whatever reason (one I can't figure out and have been trying to for a few hours), the program I'm using to go through the word inputs isn't clearing duplicates, so the next best thing I can do is the former! Y'know, cycle through the file and delete the duplicates on run. However, whenever I do, this code instead takes the last line of the list and duplicates that hundreds of thousands of times.

Thoughts, please? :(

EDIT: The idea is that cleanseArchive() goes through a file called words.txt, takes any duplicate lines and deletes them. Since Python isn't able to delete lines, though, and I haven't had luck with any other methods, I've turned to essentially saving the non-duplicate data in a list (saveList) and then writing each object from that list into a new file (deleting the old). However, as of the moment as I said, it just repeats the final object of the original list thousands upon thousands of times.

EDIT2: This is what I have so far, taking suggestions from the replies:

def cleanseArchive():
    f = open("words.txt", "r+")
    given_line = f.readlines()
    f.seek(0)
    saveList = set(given_line)
    f.close()
    os.remove("words.txt")
    f = open("words.txt", "a")
    f.write(saveList)

but ATM it's giving me this error:

Traceback (most recent call last):
  File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 154, in <module>
    initialize()
  File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 100, in initialize
    cleanseArchive()
  File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 29, in cleanseArchive
    f.write(saveList)
TypeError: must be str, not set
for i in saveList:
    f.write(n+"\n")

You basically print the value of n over and over.

Try this:

for i in saveList:
    f.write(i+"\n")

If you just want to delete "duplicated lines", I've modified your reading code:

saveList = []
duplicates = []
with open("words.txt", "r") as ins:
for line in ins:
    if line not in duplicates:
        duplicates.append(line)
        saveList.append(line)

Additionally take the correction above!

def cleanseArchive():
f = open("words.txt", "r+")
f.seek(0)
given_line = f.readlines()
saveList = set()
for x,y in enumerate(given_line):
    t=(y)
    saveList.add(t)
f.close()
os.remove("words.txt")
f = open("words.txt", "a")
for i in saveList: f.write(i)

Finished product! I ended up digging into enumerate and essentially just using that to get the strings. Man, Python has some bumpy roads when you get into sets/lists, holy shit. So much stuff not working for very ambiguous reasons! Whatever the case, fixed it up.

Let's clean up this code you gave us in your update:

def cleanseArchive():
    f = open("words.txt", "r+")
    given_line = f.readlines()
    f.seek(0)
    saveList = set(given_line)
    f.close()
    os.remove("words.txt")
    f = open("words.txt", "a")
    f.write(saveList)

We have bad names that don't respect the Style Guide for Python Code , we have superfluous code parts, we don't use the full power of Python and part of it is not working.

Let us start with dropping unneeded code while at the same time using meaningful names.

def cleanse_archive():
    infile = open("words.txt", "r")
    given_lines = infile.readlines()
    words = set(given_lines)
    infile.close()
    outfile = open("words.txt", "w")
    outfile.write(words)

The seek was not needed, the mode for opening a file to read is now just r , the mode for writing is now w and we dropped the removing of the file because it will be overwritten anyway. Having a look at this now clearer code we see, that we missed to close the file after writing. If we open the file with the with statement Python will take care of that for us.

def cleanse_archive():
    with open("words.txt", "r") as infile:
        words = set(infile.readlines())
    with open("words.txt", "w") as outfile:
        outfile.write(words)

Now that we have clear code we'll deal with the error message that occurs when outfile.write is called: TypeError: must be str, not set . This message is clear: You can't write a set directly to the file. Obviously you'll have to loop over the content of the set.

def cleanse_archive():
    with open("words.txt", "r") as infile:
        words = set(infile.readlines())
    with open("words.txt", "w") as outfile:
        for word in words:
            outfile.write(word)

That's it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM