简体   繁体   English

64位系统,8GB的RAM,超过800MB的CSV并使用python读取会导致内存错误

[英]64 bit system, 8gb of ram, a bit more than 800MB of CSV and reading with python gives memory error

f = open("data.csv")
f.seek(0) 
f_reader = csv.reader(f)
raw_data = np.array(list(islice(f_reader,0,10000000)),dtype = int)

The above is the code I am using to read a csv file. 上面是我用来读取csv文件的代码。 The csv file is only about 800 MB and I am using a 64 bit system with 8GB of Ram. csv文件只有大约800 MB,我使用的是64位系统,内存为8GB The file contains 100 million lines. 该文件包含1亿行。 However,not to mention to read the entire file, even reading the first 10 million lines gives me a 'MemoryError:" <- this is really the entire error message. 但是,更不用说读取整个文件了,即使读取前1000万行也给我一个'MemoryError:'<-这确实是整个错误消息。

Could someone tell me why please? 有人可以告诉我为什么吗? Also as a side question, could someone tell me how to read from, say the 20th million row please? 另外还有一个问题,有人可以告诉我如何朗读吗,比如说2000万行? I know I need to use f.seek(some number) but since my data is a csv file I dont know which number I should put exactly into f.seek() so that it reads exactly from 20th row. 我知道我需要使用f.seek(某个数字),但是由于我的数据是一个csv文件,所以我不知道应该将哪个数字确切地放入f.seek()中,以便它从第20行开始精确读取。

Thank you very much. 非常感谢你。

could someone tell me how to read from, say the 20th million row please? 有人可以告诉我如何阅读,例如说第2千万行吗? I know I need to use f.seek(some number) 我知道我需要使用f.seek(一些数字)

No, you can't (and mustn't) use f.seek() in this situation. 不,在这种情况下,您不能(也不能)使用f.seek() Rather, you must read each of the first 20 million rows somehow. 相反,您必须以某种方式读取前两千万行中的每一行。

The Python documentation has this recipie: Python文档具有以下配方:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

Using that, you would start after 20,000,000 rows thusly: 使用它,您将因此开始经过20,000,000行:

#UNTESTED
f = open("data.csv")
f_reader = csv.reader(f)
consume(f_reader, 20000000)
raw_data = np.array(list(islice(f_reader,0,10000000)),dtype = int)

or perhaps this might go faster: 也许这可能会更快:

#UNTESTED
f = open("data.csv")
consume(f, 20000000)
f_reader = csv.reader(f)
raw_data = np.array(list(islice(f_reader,0,10000000)),dtype = int)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Numpy内存错误,256GB RAM,64位python和64位numpy。 ulimit问题? - Numpy memory error with 256GB of RAM, 64-bit python, and 64-bit numpy. ulimit issue? Numpy memory 错误,1GB 矩阵,64 位 Python 和 RAM 负载 - Numpy memory error with 1GB matrix, 64bit Python and load of RAM 在运行CV程序时,Python无法分配超过500MB的内存(在Ubuntu中,在Mac上为8GB RAM Virtualbox) - Python is unable to allocate more than about 500MB while running CV program (in Ubuntu on 8GB RAM Virtualbox on Mac) 如何让 Python 3.7 使用超过 2 GB 的内存? (64 位设置) - How do I get Python 3.7 to use more than 2 GB memory? (64bit setup) Raspberry Pi 4 - 8gb RAM、64gb SD 卡内存不足试图加载 Tensorflow 模型 - Raspberry Pi 4 - 8gb RAM, 64gb SD Card Running Out of Memory Trying to Load Tensorflow Model Windows 7(64位)上的Python 3.7 64位:CSV-字段大于字段限制(131072) - Python 3.7 64 bit on Windows 7 (64 bit): CSV - field larger than field limit (131072) 用于int和float的64位系统中的Python内存消耗 - Python memory consumption in 64 bit system for int and float 运行 32 位编译二进制文件的进程可以使用超过 4GB 的内存吗? - Can a process running a 32-bit compiled binary use more than 4GB of memory? 64 位 Windows 上的 Python 32 位内存限制 - Python 32-bit memory limits on 64bit windows 具有大量RAM的Python 2.7 MemoryError(64位,Ubuntu) - Python 2.7 MemoryError (64bit, Ubuntu) with plenty of RAM
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM