当文件句柄存储在Dict中时，为什么使用“ w +”而不是“ a +”在文件中出现垃圾？

Question

I wrote a function that takes a list of items with several fields and write each item to one or several files depending on the content of some of the fields. 我编写了一个函数，该函数采用具有多个字段的项目列表，然后根据某些字段的内容将每个项目写入一个或多个文件。

The name of the files is based on the content of those fields, so for example an item with value AAA in the field rating and Spain in the field country will end up in the files AAA_firms.txt , Spain_firms.txt and Spain_AAA_firms.txt (just an example, not the real case). 文件名基于这些字段的内容，因此，例如，字段rating AAA且字段country Spain的项目将以AAA_firms.txt ， Spain_firms.txt和Spain_AAA_firms.txt文件Spain_AAA_firms.txt （只是一个例子，不是实际情况）。

When I first coded it I used 'w+' as the mode to open the files, what I got was that most of the content of the files seemed to be corrupt, ^@ was the characters I had in the file, and only a few correct entries at the end of the file. 当我第一次编码时，我使用'w+'作为打开文件的模式，我得到的是文件的大部分内容似乎已损坏， ^@是文件中的字符，只有少数字符文件末尾的正确条目。 For example we are talking of a file of more than 3500 entries with only less than 100 entries at the end being legible, the rest of the file was that ^@ characters. 例如，我们说的是一个包含3500多个条目的文件，最后只有不到100个条目清晰可见，文件的其余部分是^@字符。

I could not find the cause so I made it in a different way, I stored all the entries in lists in the dict and then wrote each list to a file in one pass, again opening the file with w+ , and this worked fine but I was left with the curiosity of what happened. 我找不到原因，所以我用另一种方法做到了，我将所有条目存储在字典中的列表中，然后将每个列表一次性写入文件中，再次使用w+打开文件，但效果很好，但是我对发生的事情充满好奇。

Among other things I tried to change the 'w+' to 'a+' , and that works! 除其他外，我尝试将'w+'更改为'a+' ，并且可行！

I would like to know the exact difference that makes 'w+' work erratically and 'a+' work fine. 我想知道使'w+'不正常工作和'a+'正常工作的确切区别。

I left the code below with the mode set to 'w+' (this way it writes what seems to be garbage to the file). 我在下面的代码中将模式设置为'w+' （这样，它将似乎是垃圾的内容写入了文件）。

The code is not 100% real, I had to modify names and is part of class (the source list itself, actually a dict wrapper as you can guess from the code here). 该代码不是100％真实的，我不得不修改名称，并且是类的一部分（源列表本身，实际上是一个dict包装器，您可以从此处的代码中猜到）。

def extractLists(self, outputDir, filenameprefix):
    totalEntries = 0
    aKey = "rating"
    bKey = "country"
    nameKey = "name"
    representativeChars = 2
    fileBase = outputDir + "/" + filenameprefix
    filenameAll = fileBase + "_ALL.txt"

    xLists = dict()

    for item in self.content.values():
        if (item[aKey] != aKey):
            totalEntries = totalEntries + 1
            filenameA = fileBase + "_" + item[aKey]+ "_ANY.txt"
            filenameB = fileBase + "_ANY_" + item[bKey][0:representativeBuildingChars]+ ".txt"
            filenameAB = fileBase + "_" + item[aKey]+ "_" + item[bKey][0:representativeBuildingChars] + ".txt" 
            xLists.setdefault(filenameAll,open(filenameAll,"w+")).write(item[nameKey]+"\n")
            mailLists.setdefault(filenameA,open(filenameA,"w+")).write(item[nameKey]+"\n")
            mailLists.setdefault(filenameB,open(filenameB,"w+")).write(item[nameKey]+"\n")
            mailLists.setdefault(filenameAB,open(filenameAB,"w+")).write(item[nameKey]+"\n")

    for fileHandle in mailLists.values():
        fileHandle.close()

    print(totalEntries)  
    return totalEntries

Answer 1

You are reopening the file objects each time in the loop, even if already present in the dictionary. 您每次在循环中都会重新打开文件对象，即使字典中已经存在该对象。 The expression: 表达方式：

mailLists.setdefault(filenameA,open(filenameA,"w+"))

opens the file first , as both arguments to setdefault() need to be available. 第一次打开该文件，因为这两个参数setdefault()必须是可用的。 Using open(..., 'w+') truncates the file . 使用open(..., 'w+') 截断文件 。

This is fine when you do so for the first time the filename is not yet present, but all subsequent times, you just truncated a file for which there is still an open file handle. 第一次这样做时文件名尚不存在，但是在随后的所有时间中，您都只截断了一个仍具有打开文件句柄的文件，这很好。 That already-existing open file handle in the dictionary has a file writing position, and continues to write from that position. 字典中已经存在的打开文件句柄具有文件写入位置，并从该位置继续写入。 Since the file just has been truncated, this leads to the behaviour you observed; 由于该文件刚刚被截断，因此将导致您观察到的行为。 corrupted file contents. 损坏的文件内容。 You'll see multiple entries written as data could still be buffered; 您会看到写入的多个条目仍然可以缓冲数据； only data already flushed to disk is lost. 只有已经刷新到磁盘的数据丢失。

See this short demo (executed on OSX, different operating systems and filesystems can behave differently): 请参见以下简短演示（在OSX上执行，不同的操作系统和文件系统的行为可能不同）：

>>> with open('/tmp/testfile.txt', 'w') as f:
...     f.write('The quick brown fox')
...     f.flush()  # flush the buffer to disk
...     open('/tmp/testfile.txt', 'w')  # second open call, truncates
...     f.write(' jumps over the lazy fox')
...
<open file '/tmp/testfile.txt', mode 'w' at 0x10079b150>
>>> with open('/tmp/testfile.txt', 'r') as f:
...     f.read()
...
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 jumps over the lazy fox'

Opening the files in a append mode doesn't truncate, which is why that change made things work. 在打开的文件a追加方式不截断，这就是为什么这种变化使事情工作。

Don't keep opening files, only do so when the file is actually missing . 不要一直打开文件，只有在文件实际丢失时才打开文件。 You'll have to use an if statement for that: 您必须为此使用if语句：

if filenameA not in mailLists:
    mailLists[filenameA] = open(filenameA, 'w+')

I'm not sure why you are using + in the filemode however, since you don't appear to be reading from any of the files. 我不确定您为什么在文件模式下使用+ ，因为您似乎没有从任何文件中读取内容。

For filenameAll , that variable name never changes and you don't need to open that file in the loop at all. 对于filenameAll ，该变量名永不更改，您根本不需要在循环中打开该文件。 Move that outside of the loop and open just once. 将其移出循环并仅打开一次。

当文件句柄存储在Dict中时，为什么使用“ w +”而不是“ a +”在文件中出现垃圾？

问题描述

1 个解决方案

解决方案1
1 2016-10-19 10:03:04

当文件句柄存储在Dict中时，为什么使用“ w +”而不是“ a +”在文件中出现垃圾？

问题描述

1 个解决方案

解决方案1 1 2016-10-19 10:03:04

解决方案1
1 2016-10-19 10:03:04