[英]how to merge a shapefile with a dataframe with latitude/longitude data
I am struggling with the following issue 我正在努力解决以下问题
I have downloaded the PLUTO NYC Manhattan
Shapefile
for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page 我从这里下载了
PLUTO NYC Manhattan
市税务局的PLUTO NYC Manhattan
Shapefile
https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I am able to read them in sf
with a simple st_read
我可以用简单的
st_read
在sf
读取它们
> mydf
Simple feature collection with 42638 features and 90 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700
My problem is the following: I have a dataframe as follows 我的问题如下:我有一个数据帧如下
> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
lat lon
<dbl> <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285
I would like to merge this data to the mydf
dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf
is at the tax lot granularity), and plot the corresponding map of it. 我想将这些数据合并到上面的
mydf
数据mydf
,这样我就可以计算每个税号中我有多少纬度/经度观察值(请记住, mydf
是税务批次的粒度),并绘制相应的地图。 I need to do so using sf
. 我需要使用
sf
这样做。
In essence something similar to 本质上类似的东西
pol <- mydf %>% select(SchoolDist)
plot(pol)
but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them. 但是每个税号的计数来自于计算纬度/经度数据框中的多少点。
Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. 当然,在我的小例子中,我在同一个税号中只有2个点,所以这只会在整个区域突出显示一个单一的税号。 My real data contains a lot more points.
我的真实数据包含更多要点。
I think there is an easy way to do it, but I was not able to find it. 我认为有一种简单的方法可以做到,但我无法找到它。 Thanks!
谢谢!
This is how I would do it with arbitrary polygon and point data. 这就是我用任意多边形和点数据做的方法。 I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want.
我不会合并这两个,而只是使用几何谓词来获得你想要的计数。 Here we:
在这里,我们:
nc
dataset and transform to 3857
crs, which is projected rather than lat-long (avoids a warning in st_contains
) nc
数据集并转换为3857
crs,这是投影而不是lat-long(避免st_contains
的警告) nc
, using st_bbox
and runif
. st_bbox
和runif
在nc
的边界框内创建1000个随机点。 Note that st_as_sf
can turn a data.frame with lat long columns into sf
points. st_as_sf
可以将带有lat long列的data.frame转换为sf
点。 lengths(st_contains(polygons, points)
to get the counts of points per polygon. sgbp
objects created by a geometry predicate are basically "for each geometry in sf
x, what indices of geometries in sf
y satisfy the predicate". So lengths1
effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon. lengths(st_contains(polygons, points)
来获取每个多边形的点数。由几何谓词创建的sgbp
对象基本上是“对于sf
x中的每个几何, sf
y中几何的哪些索引满足谓词”。所以lengths1
有效给出满足每个几何的谓词的点数,在这种情况下,每个多边形中包含的点数。 sf
object as a column, we can just select
and plot them with the plot.sf
method. sf
对象中,我们就可以使用plot.sf
方法select
并绘制它们。 For your data, simply replace nc
with mydf
and leave out the call to tibble
, instead use your data.frame
with the right lat long pairs. 为您的数据,只需更换
nc
与mydf
,并离开了调用tibble
,而不是用你的data.frame
用正确的经纬度长对。
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
read_sf() %>%
st_transform(3857)
set.seed(1000)
points <- tibble(
x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
st_as_sf(coords = c("x", "y"), crs = 3857)
plot(nc$geometry)
plot(points$geometry, add = TRUE)
nc %>%
mutate(pt_count = lengths(st_contains(nc, points))) %>%
select(pt_count) %>%
plot()
Created on 2018-05-02 by the reprex package (v0.2.0). 由reprex包 (v0.2.0)创建于2018-05-02。
I tried this on your data, but the intersection is empty for the both sets of points you provided. 我在你的数据上尝试了这个,但是你提供的两组点的交集都是空的。 However, the code should work.
但是,代码应该可行。
EDIT: Simplified group_by
+ mutate
with add_count
: 编辑:使用
add_count
简化group_by
+ mutate
:
mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>%
sf::st_intersection(mydf) %>%
dplyr::add_count(LOT)
Reproducible example: 可重复的例子:
nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)),
coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>%
sf::st_intersection(nc) %>%
dplyr::add_count(FIPS)
## a better approach
ncxy = ncxy %>%
sf::st_join(nc, join=st_intersects) %>%
dplyr::add_count(FIPS)
The new column n
includes the total number of points per FIPS
code. 新列
n
包括每个FIPS
代码的总点数。
ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes
that they are planar
# A tibble: 2 x 2
# Groups: FIPS [2]
FIPS n
<fctr> <int>
1 37123 2
2 37161 1
I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue. 我不确定为什么你的数据导致空交集,但由于代码适用于上面的例子,所以必须有一个单独的问题。
HT: st_join
approach from this answer . HT:来自这个答案的
st_join
方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.