简体   繁体   English

Python:关于写行的问题吗?

[英]Python: Issue with Writing over Lines?

So, this is the code I'm using in Python to remove lines, hence the name "cleanse." 因此,这是我在Python中用于删除行的代码,因此被称为“清理”。 I have a list of a few thousand words and their parts-of-speech: 我列出了几千个单词及其词性:

NN by 神经网络

PP at PP在

PP at PP在

... This is the issue. ...这就是问题。 For whatever reason (one I can't figure out and have been trying to for a few hours), the program I'm using to go through the word inputs isn't clearing duplicates, so the next best thing I can do is the former! 出于某种原因(我无法弄清并尝试了几个小时),我用来检查输入单词的程序不会清除重复项,因此,我能做的下一件最好的事情是前任的! Y'know, cycle through the file and delete the duplicates on run. 是的,循环浏览文件并删除运行中的重复项。 However, whenever I do, this code instead takes the last line of the list and duplicates that hundreds of thousands of times. 但是,每当我这样做,这个代码,而不是采用列表的最后一行,并重复的几十万次。

Thoughts, please? 有什么想法吗? :( :(

EDIT: The idea is that cleanseArchive() goes through a file called words.txt, takes any duplicate lines and deletes them. 编辑:这个想法是cleanseArchive()会通过一个名为words.txt的文件,将所有重复的行删除。 Since Python isn't able to delete lines, though, and I haven't had luck with any other methods, I've turned to essentially saving the non-duplicate data in a list (saveList) and then writing each object from that list into a new file (deleting the old). 但是,由于Python无法删除行,而且我还没有其他方法的运气,因此我转向了本质上将非重复数据保存在列表(saveList)中,然后从该列表中写入每个对象放入新文件(删除旧文件)。 However, as of the moment as I said, it just repeats the final object of the original list thousands upon thousands of times. 但是,就目前而言,它只是成千上万次重复了原始列表的最终对象。

EDIT2: This is what I have so far, taking suggestions from the replies: EDIT2:这是我到目前为止的内容,并从答复中获取建议:

def cleanseArchive():
    f = open("words.txt", "r+")
    given_line = f.readlines()
    f.seek(0)
    saveList = set(given_line)
    f.close()
    os.remove("words.txt")
    f = open("words.txt", "a")
    f.write(saveList)

but ATM it's giving me this error: 但是在ATM机上,我会遇到以下错误:

Traceback (most recent call last):
  File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 154, in <module>
    initialize()
  File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 100, in initialize
    cleanseArchive()
  File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 29, in cleanseArchive
    f.write(saveList)
TypeError: must be str, not set
for i in saveList:
    f.write(n+"\n")

You basically print the value of n over and over. 您基本上一遍又一遍地打印n的值。

Try this: 尝试这个:

for i in saveList:
    f.write(i+"\n")

If you just want to delete "duplicated lines", I've modified your reading code: 如果您只想删除“重复的行”,我已经修改了您的阅读代码:

saveList = []
duplicates = []
with open("words.txt", "r") as ins:
for line in ins:
    if line not in duplicates:
        duplicates.append(line)
        saveList.append(line)

Additionally take the correction above! 此外,请采取上述更正!

def cleanseArchive():
f = open("words.txt", "r+")
f.seek(0)
given_line = f.readlines()
saveList = set()
for x,y in enumerate(given_line):
    t=(y)
    saveList.add(t)
f.close()
os.remove("words.txt")
f = open("words.txt", "a")
for i in saveList: f.write(i)

Finished product! 完成的产品! I ended up digging into enumerate and essentially just using that to get the strings. 我最终研究了枚举,本质上只是使用它来获取字符串。 Man, Python has some bumpy roads when you get into sets/lists, holy shit. 伙计,当您进入集合/列表时,Python会有一些坎bump的道路,这真是太糟糕了。 So much stuff not working for very ambiguous reasons! 太多东西由于非常模棱两可的原因而无法正常工作! Whatever the case, fixed it up. 无论如何,将其修复。

Let's clean up this code you gave us in your update: 让我们整理一下您在更新中提供的代码:

def cleanseArchive():
    f = open("words.txt", "r+")
    given_line = f.readlines()
    f.seek(0)
    saveList = set(given_line)
    f.close()
    os.remove("words.txt")
    f = open("words.txt", "a")
    f.write(saveList)

We have bad names that don't respect the Style Guide for Python Code , we have superfluous code parts, we don't use the full power of Python and part of it is not working. 我们有一些不好的名字,不尊重《 Python代码样式指南》 ,我们有多余的代码部分,我们没有充分利用Python的强大功能,并且部分代码无法正常工作。

Let us start with dropping unneeded code while at the same time using meaningful names. 让我们从删除不需要的代码开始,同时使用有意义的名称。

def cleanse_archive():
    infile = open("words.txt", "r")
    given_lines = infile.readlines()
    words = set(given_lines)
    infile.close()
    outfile = open("words.txt", "w")
    outfile.write(words)

The seek was not needed, the mode for opening a file to read is now just r , the mode for writing is now w and we dropped the removing of the file because it will be overwritten anyway. 不需要seek ,打开文件以读取的模式现在为r ,写入模式为w ,我们删除了删除文件的步骤,因为无论如何它都会被覆盖。 Having a look at this now clearer code we see, that we missed to close the file after writing. 看一下我们现在看到的这段更清晰的代码,我们错过了写入后关闭文件的过程。 If we open the file with the with statement Python will take care of that for us. 如果我们使用with语句打开文件,Python将为我们处理。

def cleanse_archive():
    with open("words.txt", "r") as infile:
        words = set(infile.readlines())
    with open("words.txt", "w") as outfile:
        outfile.write(words)

Now that we have clear code we'll deal with the error message that occurs when outfile.write is called: TypeError: must be str, not set . 既然我们有了清晰的代码,我们将处理在调用outfile.write时发生的错误消息: TypeError: must be str, not set This message is clear: You can't write a set directly to the file. 此消息很清楚:您不能将集合直接写入文件。 Obviously you'll have to loop over the content of the set. 显然,您必须遍历集合的内容。

def cleanse_archive():
    with open("words.txt", "r") as infile:
        words = set(infile.readlines())
    with open("words.txt", "w") as outfile:
        for word in words:
            outfile.write(word)

That's it. 而已。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM