简体   繁体   English

如何将shapefile与具有纬度/经度数据的数据框合并

[英]how to merge a shapefile with a dataframe with latitude/longitude data

I am struggling with the following issue 我正在努力解决以下问题

I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page 我从这里下载了PLUTO NYC Manhattan 市税务局PLUTO NYC Manhattan Shapefile https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page

I am able to read them in sf with a simple st_read 我可以用简单的st_readsf读取它们

> mydf
Simple feature collection with 42638 features and 90 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID):    NA
proj4string:    +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
   Borough Block  Lot  CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1       MN  1545   52 108    138   4000         02       5   10028     E022         19         13       3700

My problem is the following: I have a dataframe as follows 我的问题如下:我有一个数据帧如下

> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
        lat        lon
      <dbl>      <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285

I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. 我想将这些数据合并到上面的mydf数据mydf ,这样我就可以计算每个税号中我有多少纬度/经度观察值(请记住, mydf是税务批次的粒度),并绘制相应的地图。 I need to do so using sf . 我需要使用sf这样做。

In essence something similar to 本质上类似的东西

pol <- mydf %>% select(SchoolDist)
plot(pol)

在此输入图像描述

but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them. 但是每个税号的计数来自于计算纬度/经度数据框中的多少点。

Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. 当然,在我的小例子中,我在同一个税号中只有2个点,所以这只会在整个区域突出显示一个单一的税号。 My real data contains a lot more points. 我的真实数据包含更多要点。

I think there is an easy way to do it, but I was not able to find it. 我认为有一种简单的方法可以做到,但我无法找到它。 Thanks! 谢谢!

This is how I would do it with arbitrary polygon and point data. 这就是我用任意多边形和点数据做的方法。 I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. 我不会合并这两个,而只是使用几何谓词来获得你想要的计数。 Here we: 在这里,我们:

  1. Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains ) 使用内置的nc数据集并转换为3857 crs,这是投影而不是lat-long(避免st_contains的警告)
  2. Create 1000 random points within the bounding box of nc , using st_bbox and runif . 使用st_bboxrunifnc的边界框内创建1000个随机点。 Note that st_as_sf can turn a data.frame with lat long columns into sf points. 请注意, st_as_sf可以将带有lat long列的data.frame转换为sf点。
  3. Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon. 使用lengths(st_contains(polygons, points)来获取每个多边形的点数。由几何谓词创建的sgbp对象基本上是“对于sf x中的每个几何, sf y中几何的哪些索引满足谓词”。所以lengths1有效给出满足每个几何的谓词的点数,在这种情况下,每个多边形中包含的点数。
  4. Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method. 一旦计数作为列在sf对象中,我们就可以使用plot.sf方法select并绘制它们。

For your data, simply replace nc with mydf and leave out the call to tibble , instead use your data.frame with the right lat long pairs. 为您的数据,只需更换ncmydf ,并离开了调用tibble ,而不是用你的data.frame用正确的经纬度长对。

library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
  read_sf() %>%
  st_transform(3857)
set.seed(1000)
points <- tibble(
  x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
  y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
  st_as_sf(coords = c("x", "y"), crs = 3857)

plot(nc$geometry)
plot(points$geometry, add = TRUE)

nc %>%
  mutate(pt_count = lengths(st_contains(nc, points))) %>%
  select(pt_count) %>%
  plot()

Created on 2018-05-02 by the reprex package (v0.2.0). reprex包 (v0.2.0)创建于2018-05-02。

I tried this on your data, but the intersection is empty for the both sets of points you provided. 我在你的数据上尝试了这个,但是你提供的两组点的交集都是空的。 However, the code should work. 但是,代码应该可行。

EDIT: Simplified group_by + mutate with add_count : 编辑:使用add_count简化group_by + mutate

mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>% 
    sf::st_intersection(mydf) %>% 
    dplyr::add_count(LOT)

Reproducible example: 可重复的例子:

nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)), 
           coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>% 
           sf::st_intersection(nc) %>%
           dplyr::add_count(FIPS)

## a better approach
ncxy = ncxy %>%
           sf::st_join(nc, join=st_intersects) %>%
           dplyr::add_count(FIPS)

The new column n includes the total number of points per FIPS code. 新列n包括每个FIPS代码的总点数。

ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes 
  that they are planar
  # A tibble: 2 x 2
  # Groups:   FIPS [2]
    FIPS     n
   <fctr> <int>
  1  37123     2
  2  37161     1

I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue. 我不确定为什么你的数据导致空交集,但由于代码适用于上面的例子,所以必须有一个单独的问题。

HT: st_join approach from this answer . HT:来自这个答案的 st_join方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM