简体   繁体   English

使用 geopandas (Python) 从空间 dataframe 进行空间分箱

[英]Spatial binning from a spatial dataframe using geopandas (Python)

I want to do a spatial binning (using median as aggregation function) starting from a CSV file containing pollutant values measured at positions long and lat.我想从包含在经纬度位置测量的污染物值的 CSV 文件开始进行空间分箱(使用中值作为聚合函数)。
The resulting map should be something as:生成的 map 应该是:

在此处输入图像描述

But for data applied to a city's extent.但是对于应用于城市范围的数据。 At this regard I found this tutorial that is close to what I want to do, but I was not able to get the desired result.在这方面,我发现本 教程与我想要做的很接近,但我无法得到想要的结果。 I think that I'm missing something on how to correctly use dissolve and plot the resulting data (better using Folium ) Any useful example code?我认为我缺少有关如何正确使用dissolve和 plot 生成数据的内容(更好地使用Folium )任何有用的示例代码?

  • you have not provided sample data.你没有提供样本数据。 So I have used global earthquakes as set of points and geometry of California for scope / extent所以我使用全球地震作为加利福尼亚的点和几何 scope / 范围
  • it's simple to create grid using shapely.geometry.box()使用shapely.geometry.box()创建网格很简单
  • I have shown use of median and also another aggfunc to demonstrate multiple metrics can be calculated我已经展示了使用中位数和另一个aggfunc来演示可以计算多个指标
  • have used folium to plot.已经使用过plot的大叶。 This feature is new in geopandas 0.10.0 https://geopandas.org/en/stable/docs/user_guide/interactive_mapping.html此功能是geopandas 0.10.0 https://geopandas.org/en/stable/docs/user_guide/interactive_mapping.html中的新功能
import geopandas as gpd
import shapely.geometry
import numpy as np

# equivalent of CSV, all earthquake points globally
gdf_e = gpd.read_file(
    "https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.geojson"
)

# get geometry of bounding area.  Have selected a state rather than a city
gdf_CA = gpd.read_file(
    "https://raw.githubusercontent.com/glynnbird/usstatesgeojson/master/california.geojson"
).loc[:, ["geometry"]]

BOXES = 50
a, b, c, d = gdf_CA.total_bounds

# create a grid for Califormia, could be a city
gdf_grid = gpd.GeoDataFrame(
    geometry=[
        shapely.geometry.box(minx, miny, maxx, maxy)
        for minx, maxx in zip(np.linspace(a, c, BOXES), np.linspace(a, c, BOXES)[1:])
        for miny, maxy in zip(np.linspace(b, d, BOXES), np.linspace(b, d, BOXES)[1:])
    ],
    crs="epsg:4326",
)

# remove grid boxes created outside actual geometry
gdf_grid = gdf_grid.sjoin(gdf_CA).drop(columns="index_right")

# get earthquakes that have occured within one of the grid geometries
gdf_e_CA = gdf_e.loc[:, ["geometry", "mag"]].sjoin(gdf_grid)
# get median magnitude of eargquakes in grid
gdf_grid = gdf_grid.join(
    gdf_e_CA.dissolve(by="index_right", aggfunc="median").drop(columns="geometry")
)
# how many earthquakes in the grid
gdf_grid = gdf_grid.join(
    gdf_e_CA.dissolve(by="index_right", aggfunc=lambda d: len(d))
    .drop(columns="geometry")
    .rename(columns={"mag": "number"})
)

# drop grids geometries that have no measures and create folium map
m = gdf_grid.dropna().explore(column="mag")
# for good measure - boundary on map too
gdf_CA["geometry"].apply(lambda g: shapely.geometry.MultiLineString([p.exterior for p in g.geoms])).explore(m=m)

在此处输入图像描述

I want to convert a pandas DataFrame to a spatial enabled geopandas one as:我想将 pandas DataFrame 转换为启用空间的 geopandas 之一:

df=pd.read_csv('../Desktop/test_esri.csv')
df.head()

在此处输入图像描述

Then converted using:然后使用以下转换:

gdf = geopandas.GeoDataFrame(
    df, geometry=geopandas.points_from_xy(df.long, df.lat))
from pyproj import crs
crs_epsg = crs.CRS.from_epsg(4326)
gdf=gdf.set_crs('epsg:4326')

Then I want to overimpose a spatial grid as:然后我想将空间网格过度叠加为:

import numpy as np
import shapely
from pyproj import crs
# total area for the grid
xmin, ymin, xmax, ymax= gdf.total_bounds
# how many cells across and down
n_cells=30
cell_size = (xmax-xmin)/n_cells
# projection of the grid
# crs = "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs"
# create the cells in a loop
grid_cells = []
for x0 in np.arange(xmin, xmax+cell_size, cell_size ):
    for y0 in np.arange(ymin, ymax+cell_size, cell_size):
        # bounds
        x1 = x0-cell_size
        y1 = y0+cell_size
        grid_cells.append( shapely.geometry.box(x0, y0, x1, y1)  )
cell = geopandas.GeoDataFrame(grid_cells, columns=['geometry'], 
                                 crs=crs.CRS('epsg:4326'))

Then merge the grid with geopandas dataframe:然后将网格与 geopandas dataframe 合并:

merged = geopandas.sjoin(gdf, cell, how='left', predicate='within')

To finally compute the desired metric inside "dissolve":最终在“溶解”中计算所需的指标:

# Compute stats per grid cell -- aggregate fires to grid cells with dissolve
dissolve = merged.dissolve(by="index_right", aggfunc="median")

But I think I did something wrong with the "cell" grid and I can't figure it out!!但我认为我在“单元格”网格上做错了,我想不通! An extract of csv file used con be found here .这里可以找到 csv 文件的摘录。

Finally solved with the following code:最后用以下代码解决:

import pandas as pd
import geopandas as gpd
import pyproj
import matplotlib.pyplot as plt
import numpy as np
import shapely
from folium import plugins

df=pd.read_csv('../Desktop/test_esri.csv')
gdf_monica = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df.long, df.lat))
gdf_monica=gdf_monica.set_crs('epsg:4326')

# equivalent of CSV, all earthquake points globally
gdf_e = gdf_monica

# get geometry of bounding area.  Have selected a state rather than a city
gdf_CA = gpd.read_file('https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_municipalities.geojson')#.loc[:, ["geometry"]]

gdf_CA =gdf_CA[gdf_CA['name']=='Portici'].loc[:,['geometry']]

BOXES = 50
a, b, c, d = gdf_CA.total_bounds

# create a grid for Califormia, could be a city
gdf_grid = gpd.GeoDataFrame(
    geometry=[
        shapely.geometry.box(minx, miny, maxx, maxy)
        for minx, maxx in zip(np.linspace(a, c, BOXES), np.linspace(a, c, BOXES)[1:])
        for miny, maxy in zip(np.linspace(b, d, BOXES), np.linspace(b, d, BOXES)[1:])
    ],
    crs="epsg:4326",
)

# remove grid boxes created outside actual geometry
gdf_grid = gdf_grid.sjoin(gdf_CA).drop(columns="index_right")

# get earthquakes that have occured within one of the grid geometries
gdf_e_CA = gdf_e.loc[:, ["geometry", "CO"]].sjoin(gdf_grid)
# get median magnitude of eargquakes in grid
gdf_grid = gdf_grid.join(
    gdf_e_CA.dissolve(by="index_right", aggfunc="median").drop(columns="geometry")
)
# how many earthquakes in the grid
gdf_grid = gdf_grid.join(
    gdf_e_CA.dissolve(by="index_right", aggfunc=lambda d: len(d))
    .drop(columns="geometry")
    .rename(columns={"CO": "number"})
)

# drop grids geometries that have no measures and create folium map
m = gdf_grid.dropna().explore(column="CO")
# for good measure - boundary on map too
gdf_CA["geometry"].apply(lambda g: shapely.geometry.MultiLineString([p.exterior for p in g.geoms])).explore(m=m)

that produce:产生: 在此处输入图像描述

As you can understand, I have little or no knowledge regarding spatial analysis.如您所知,我对空间分析知之甚少或一无所知。 I was not able to get correct results without using geojson data that describe a geometry within which the points of interest fall.如果不使用描述兴趣点所在几何的 geojson 数据,我无法获得正确的结果。 If anyone could add more insights... thanks!如果有人可以添加更多见解...谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM