简体   繁体   English

Python readline()和Counter在很长的一行上导致MemoryError

[英]Python readline() and Counter causes MemoryError on very long line

I'm having the issue of a memory error. 我遇到了内存错误的问题。

pifile = 'pibillion.txt'
with open(pifile, "r+") as a:
    data = str(a.readline())
    c = Counter(data)

All my code does is read one very very large line of the digits of pi. 我所有的代码都是读取pi的非常大的一行。 The txt file is only 953 MB. txt文件只有953 MB。 I have 8 GB RAM. 我有8 GB RAM。 I'm guessing the error is that it runs into the String size limitation but I'm not sure. 我猜错误是它遇到了字符串大小限制,但是我不确定。 The rest of the code inserts a line break at increments of two. 其余代码以2的增量插入一个换行符。 Any help would be greatly appreciated as to how to continue with this. 对于如何继续进行任何帮助,将不胜感激。

The exact error I'm getting is this: 我得到的确切错误是:

data = str(a.readline())
   MemoryError

Python is not inherently lazy (like haskell), so reading a string will put it all in memory. Python并不是天生的懒惰(例如haskell),因此读取字符串会将其全部存储在内存中。 Add to that some string conversions and you're out of memory. 再加上一些字符串转换,您就没有内存了。 Instead, do this iteratively, like the following. 而是,像下面这样迭代地执行此操作。

Note that I have used a new file, as files are usually stored contiguously, so inserting is very expensive. 请注意,我使用了一个新文件,因为文件通常是连续存储的,因此插入非常昂贵。

with open('pibillion.txt', 'r') as old_file, open('pibillion_.txt', 'w') as new_file:
    while True:
        c = old_file.read(2)
        if not c:
            break
        new_file.write(c + '\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM