简体   繁体   中英

Memory error on large Shapefile in Python

import shapefile
data = shapefile.Reader("data_file.shp")
shapes = data.shapes()

My problem is that getting the shapes from the Shapefile reader gives me an exception MemoryError when using Pyshp .

The .shp file is quite large, at 1.2 gB. But I am using ony 3% of my machine's 32gB, so I don't understand it.

Is there any other approach that I can take? Can process the file in chunks in Python? Or use some tool to spilt the file into chinks, then process each of them individually?

Quoting from this answer by thomas:

The MemoryError exception that you are seeing is the direct result of running out of available RAM. This could be caused by either the 2GB per program limit imposed by Windows ( 32bit programs ), or lack of available RAM on your computer. (This link is to a previous question). You should be able to extend the 2GB by using 64bit copy of Python, provided you are using a 64bit copy of windows.

So try a 64bit copy of Python or provide more detail about your platform and Python versions.

Although I haven't been able to test it, Pyshp should be able to read it regardless of the file size or memory limits. Creating the Reader instance doesn't load the entire file, only the header information.

It seems the problem here is that you used the shapes() method, which reads all shape information into memory at once. This usually isn't a problem, but it is with files this big. As a general rule you should instead use the iterShapes() method which reads each shape one by one.

import shapefile
data = shapefile.Reader("data_file.shp")
for shape in data.iterShapes():
    # do something...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM