简体   繁体   中英

How to load a zip file (containing shp) from s3 bucket to Geopandas?

I zipped name.shp, name.shx, name.dbf files and uploaded them into a AWS s3 bucket. So now, i wanna load this zip file and convert the contained shapefile into a GeoDataFrame of geopandas.

I can do it perfectly if the file is a zipped geojson instead of zipped shapefile.

import io
import boto3
import geopandas as gpd
import zipfile

cliente = boto3.client("s3", aws_access_key_id=ak, aws_secret_access_key=sk)

bucket_name = 'bucketname'
object_key = 'myfolder/locations.zip'

bytes_buffer = io.BytesIO()
cliente.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
geojson = bytes_buffer.getvalue()

with zipfile.ZipFile(bytes_buffer) as zi:
    with zi.open("locations.shp") as file:
        print(gpd.read_file(file.read().decode('ISO-8859-9')))

I got this error:

ç¤íEÀ¡ËÆ3À: No such file or directory

Basically geopandas package allows to read files directly from S3. And as mentioned in the answer above it allows to read zip files also. So below you can see the code which will read zip file from s3 without downloading it. You need to enter zip+s3:// in the beginning, then add the path in S3.

geopandas.read_file(f'zip+s3://bucket-name/file.zip')

You can read zip directly, no need to use zipfile. You need all parts of Shapefile, not just .shp itself. That is why it works with geojson. You just need to pass it with zip:/// . So instead of

gpd.read_file('path/file.shp')

You go with

gpd.read_file('zip:///path/file.zip')

I am not familiar enough with boto3 to know at which point you actually have this path, but I think it will help.

I do not know if it can be of any help, but I faced a similar problem recently, though I only wanted to read the .shp with fiona . I ended up like others zipping the relevant shp , dbf , cpg and shx on the bucket.

And to read from the bucket, I do like so:

from io import BytesIO
from pathlib import Path
from typing import List
from typing import Union

import boto3
from fiona.io import ZipMemoryFile
from pydantic import BaseSettings
from shapely.geometry import Point
from shapely.geometry import Polygon
import fiona

class S3Configuration(BaseSettings):
    """
    S3 configuration class
    """
    s3_access_key_id: str = ''
    s3_secret_access_key: str = ''
    s3_region_name: str = ''
    s3_endpoint_url: str = ''
    s3_bucket_name: str = ''
    s3_use: bool = False

S3_CONF = S3Configuration()
S3_STR = 's3'
S3_SESSION = boto3.session.Session()
S3 = S3_SESSION.resource(
    service_name=S3_STR,
    aws_access_key_id=S3_CONF.s3_access_key_id,
    aws_secret_access_key=S3_CONF.s3_secret_access_key,
    endpoint_url=S3_CONF.s3_endpoint_url,
    region_name=S3_CONF.s3_region_name,
    use_ssl=True,
    verify=True,
) 
BUCKET = S3_CONF.s3_bucket_name
CordexShape = Union[Polygon, List[Polygon], List[Point]]
ZIP_EXT = '.zip'


def get_shapefile_data(file_path: Path, s3_use: S3_CONF.s3_use) -> CordexShape:
    """
    Retrieves the shapefile content associated to the passed file_path (either on disk or on S3).
    file_path is a .shp file.
    """
    if s3_use:
        return load_zipped_shp(get_s3_object(file_path.with_suffix(ZIP_EXT)), file_path)
    return load_shp(file_path)


def get_s3_object(file_path: Path) -> bytes:
    """
    Retrieve as bytes the content associated to the passed file_path
    """
    return S3.Object(bucket_name=BUCKET, key=forge_key(file_path)).get()['Body'].read()


def forge_key(file_path: Path) -> str:
    """
    Edit this code at your convenience to forge the bucket key out of the passed file_path
    """
    return str(file_path.relative_to(*file_path.parts[:2]))


def load_shp(file_path: Path) -> CordexShape:
    """
    Retrieve a list of Polygons stored at file_path location
    """
    with fiona.open(file_path) as shape:
        parsed_shape = list(shape)
    return parsed_shape


def load_zipped_shp(zipped_data: bytes, file_path: Path) -> CordexShape:
    """
    Retrieve a list of Polygons stored at file_path location
    """
    with ZipMemoryFile(BytesIO(zipped_data)) as zip_memory_file:
        with zip_memory_file.open(file_path.name) as shape:
            parsed_shape = list(shape)
    return parsed_shape

There is quite a lot of code, but the first part is very helpful to easily use a minio proxy for local devs (just have to change the.env).

The key to solve the issue for me was the use of fiona not so well documented (in my opinion) but life saver (in my case:)) ZipMemoryFile

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM