简体   繁体   English

使用 pysftp 从 SFTP 读取 SHP 文件

[英]Read SHP file from SFTP using pysftp

I am trying to use pysftp's getfo() to read a shapefile (without downloading it).我正在尝试使用 pysftp 的 getfo() 来读取 shapefile(无需下载)。 However the output I get does not seem workable and I'm not sure if its possible do this with a shapefile.然而,我得到的 output 似乎不可行,我不确定它是否可以用 shapefile 做到这一点。

Ideally I would like to read in the file and convert it to a Geopandas GeoDataFrame.理想情况下,我想读入文件并将其转换为 Geopandas GeoDataFrame。

import pysftp
import io

with pysftp.Connection(host=myHostname, username=myUsername, password=myPassword, cnopts=cnopts) as sftp:
    print("Connection established ... ")

    
   
    # check files in directory
    directory_structure = sftp.listdir('sites')
    

    # Print data
    for attr in directory_structure:
        print(attr)
    
    
    flo = io.BytesIO()
    sites = sftp.getfo('sites/Sites.shp', flo)
    value=flo.getvalue()
   

From here I can't decode the value and am unsure of how to proceed.从这里我无法解码该值,并且不确定如何继续。

Something like this should do:这样的事情应该做:

flo.seek(0)
df = geopandas.read_file(shp=flo)

Though using the Connection.getfo unnecesarily keeps whole raw file in memory.尽管使用Connection.getfo不必要地将整个原始文件保留在 memory 中。 More efficient would be:更有效的是:

with sftp.open('sites/Sites.shp', bufsize=32768) as f:
    df = geopandas.read_file(f)

(for the purpose of bufsize=32768 , see Reading file opened with Python Paramiko SFTPClient.open method is slow ) (对于bufsize=32768的目的,请参阅读取用 Python Paramiko SFTPClient.open 方法打开的文件很慢


Btw, note that the code downloads the file anyway.顺便说一句,请注意代码无论如何都会下载文件。 You cannot parse a remote file contents, without actually downloading that file contents.您无法解析远程文件内容,而无需实际下载该文件内容。 The code just avoids storing the downloaded file contents to a (temporary) local file.该代码只是避免将下载的文件内容存储到(临时)本地文件中。

Using Martin's answer, I was able to get to the outcome I needed.使用马丁的回答,我能够得到我需要的结果。 However it was required that the data was zipped in the SFTP, so it required a change to the datasource.但是,需要将数据压缩到 SFTP 中,因此需要更改数据源。

The output here is a dataframe but this is easily adapted to be a GeoDataFrame.这里的 output 是 dataframe 但这很容易适应为 GeoDataFrame。

with pysftp.Connection(host=myHostname, username=myUsername, password=myPassword, cnopts=cnopts) as sftp:
    print("Connection succesfully stablished ... ")  


    zipshape = zipfile.ZipFile(sftp.open('sites/sites_test.zip', bufsize=32768))
    r = shapefile.Reader(
            shp=zipshape.open('sites/Sites.shp'),
            shx=zipshape.open('sites/Sites.SHX'),
            dbf=zipshape.open('sites/Sites.DBF')
        )
    
    #check we have actually got the file
    print(r.bbox)
    print(r.numRecords)
    
    #get field names
    fields = [x[0] for x in r.fields][1:]
    records = r.records()
    
    #get coords and tidy up
    shps = [s.points for s in r.shapes()]
    
    #write the records into a dataframe 
    gdf = pd.DataFrame(columns=fields, data=records)

    #add the coordinate data to a column called "coords" 
    gdf = gdf.assign(coords=coords)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM