简体   繁体   English

Python中大型Shapefile的内存错误

[英]Memory error on large Shapefile in Python

import shapefile
data = shapefile.Reader("data_file.shp")
shapes = data.shapes()

My problem is that getting the shapes from the Shapefile reader gives me an exception MemoryError when using Pyshp . 我的问题是,使用Pyshp时,从Shapefile阅读器获取形状会给我一个异常MemoryError

The .shp file is quite large, at 1.2 gB. .shp文件很大,为1.2 gB。 But I am using ony 3% of my machine's 32gB, so I don't understand it. 但是我只使用机器32gB的3%,所以我听不懂。

Is there any other approach that I can take? 我还能采取其他方法吗? Can process the file in chunks in Python? 可以在Python中分块处理文件吗? Or use some tool to spilt the file into chinks, then process each of them individually? 还是使用某种工具将文件撒到碎片中,然后分别处理它们?

Quoting from this answer by thomas: 引用托马斯的回答

The MemoryError exception that you are seeing is the direct result of running out of available RAM. 您看到的MemoryError异常是可用RAM耗尽的直接结果。 This could be caused by either the 2GB per program limit imposed by Windows ( 32bit programs ), or lack of available RAM on your computer. 这可能是由于Windows( 32位程序 )对每个程序施加了2GB的限制,或者计算机上没有可用的RAM。 (This link is to a previous question). (此链接是上一个问题)。 You should be able to extend the 2GB by using 64bit copy of Python, provided you are using a 64bit copy of windows. 如果您使用的是Windows的64位副本,那么您应该能够使用Python的64位副本扩展2GB。

So try a 64bit copy of Python or provide more detail about your platform and Python versions. 因此,请尝试使用64位的Python副本或提供有关您的平台和Python版本的更多详细信息。

Although I haven't been able to test it, Pyshp should be able to read it regardless of the file size or memory limits. 尽管我无法对其进行测试,但是Pyshp应该能够读取它,而不管文件大小或内存限制如何。 Creating the Reader instance doesn't load the entire file, only the header information. 创建Reader实例不会加载整个文件,只会加载标头信息。

It seems the problem here is that you used the shapes() method, which reads all shape information into memory at once. 这里的问题似乎是您使用了shapes()方法,该方法将所有形状信息立即读取到内存中。 This usually isn't a problem, but it is with files this big. 通常这不是问题,但是文件很大。 As a general rule you should instead use the iterShapes() method which reads each shape one by one. 通常,您应该改为使用iterShapes()方法,该方法iterShapes()读取每个形状。

import shapefile
data = shapefile.Reader("data_file.shp")
for shape in data.iterShapes():
    # do something...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM