[英]Find maximum point with in each polygon for a set of polygons R
I'm sure this question has been answered elsewhere, but I have not been able to come up with it by searching. 我确定这个问题已经在其他地方得到了解答,但是我无法通过搜索提出。
I have points representing cities within a country along with population for each city. 我有代表一个国家内的城市以及每个城市人口的点。 I also have a polygon file of counties.
我也有一个县的多边形文件。 I want to find the location of the largest city within each county.
我想找到每个县内最大城市的位置。
How can this be done? 如何才能做到这一点?
Here is some data 这是一些数据
structure(list(Country = c("us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", 结构(列表(国家= c(“ us”,“ us”,“ us”,“ us”,“ us”,“ us”,“ us”,“ us”,“ us”,“ us”,“ us ”,
"us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us", "us"), City = c("cabarrus", "cox store", "cal-vel", "briarwood townhouses", "barker heights", "davie “我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们”,“我们“,”我们“),城市= c(” cabarrus“,” cox store“,” cal-vel“,” briarwood townhouses“,” barker heights“,” davie
crossroads", "crab point village", "azalea", "chesterfield", "charlesmont", "connor", "clover garden", "corriher heights", "callisons", "crestview acres", "clegg", "canaan park", "chantilly", "belgrade", "brices crossroads", "bluff", "butner", "bottom", "bandy", "bostian heights"), AccentCity = c("Cabarrus", "Cox Store", "Cal-Vel", "Briarwood Townhouses", "Barker Heights", "Davie Crossroads", "Crab Point Village", "Azalea", "Chesterfield", "Charlesmont", "Connor", "Clover Garden", "Corriher Heights", "Callisons", "Crestview Acres", "Clegg", "Canaan Park", "Chantilly", "Belgrade", "Brices Crossroads", "Bluff", "Butner", "Bottom", "Bandy", "Bostian Heights"), Region = c("NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC", "NC"), Population = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, A 十字路口”,“蟹点村”,“ azalea”,“ chesterfield”,“ charlesmont”,“ connor”,“ clover garden”,“ corriher heights”,“ callisons”,“ crestview acres”,“ clegg”,“ canaan”公园”,“尚蒂伊”,“贝尔格莱德”,“ brices十字路口”,“虚张声势”,“ butner”,“ bottom”,“ bandy”,“ bostian heights”),AccentCity = c(“ Cabarrus”,“ Cox Store” ,“ Cal-Vel”,“ Briarwood Townhouses”,“ Barker Heights”,“ Davie Crossroads”,“ Crab Point Village”,“ Azalea”,“ Chesterfield”,“ Charlesmont”,“ Connor”,“ Clover Garden”,“ Corriher Heights”,“ Callisons”,“ Crestview Acres”,“ Clegg”,“ Canaan Park”,“ Chantilly”,“ Belgrade”,“ Brices Crossroads”,“ Bluff”,“ Butner”,“ Bottom”,“ Bandy” ,“ Bostian Heights”),区域= c(“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC” ,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC”,“ NC“,” NC“,” NC“),人口= c(NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_, _integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), Latitude = (35.2369444, 35.275, 36.4291667, 35.295, 35.3111111, 35.8319444, 34.7602778, 35.58, 35.81, 5.9341667, 35.7419444, 36.1883333, 35.5605556, 35.0841667, 35.0213889, 35.8655556, 36.2761111, 36.3016667, 34.88, 34.8186111, 35.8377778, 36.1319444, 36.4747222, 35.6419444, 35.7544444), Longitude = c(-80.5419444, -82.0352778, -78.9694444, -81.5238889, -82.4441667, -80.535, -76.7305556, -82.4713889, -81.6611111, -81.5127778, -78.1486111, -79.4630556, -80.635, -76.7255556, -80.5427778, -78.8497222, -79.7852778, -76.1711111, -77.2352778, -78.1016667, -82.8580556, -78.7569444, -80.7741667, -81.09, -80.9294444)), .Names = c("Country", "City", "AccentCity", "Region", "Population", "Latitude", "Longitude"), row.names = c(544L, 889L, 551L, 434L, 190L, 975L, 894L, 147L, 717L, 700L, 831L, 773L, 862L, 559L, 915L, 753L, 584L, 695L,
_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_),纬度=(35.2369444,35.275,36.4291667,35.295,35.3111111,35.8319444,34.7602778,35.58,35.81,5.9341667, 35.7419444、36.1883333、35.5605556、35.0841667、35.0213889、35.8655556、36.2761111、36.3016667、34.88、34.8186111、35.8377778、36.1319444、36.4747222、35.6419444、35.7544444),经度= c(-80.5419444,-82.0352778,-72.5238667444) -80.535,-76.7305556,-82.4713889,-81.6611111,-81.5127778,-78.1486111,-79.4630556,-80.635,-76.7255556,-80.5427778,-78.8497222,-79.7852778,-76.1711111,-77.2352778,-78.1016667,-82.8580556556,-7 ,-80.7741667,-81.09,-80.9294444))。.names = c(“国家/地区”,“城市”,“ AccentCity”,“区域”,“人口”,“纬度”,“经度”),row.names = c(544L,889L,551L,434L,190L,975L,894L,147L,717L,700L,831L,773L,862L,559L,915L,753L,584L,695L, 262L, 437L, 372L, 537L, 406L, 178L, 02L), class = "data.frame")
262L,437L,372L,537L,406L,178L,02L),类=“ data.frame”)
and some code to read in north carolina 以及在北卡罗来纳州阅读的一些代码
xx <- readShapePoly(system.file("shapes/sids.shp", package="maptools")[1],
IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
plot(xx)
I want to find the city with the maximum population within each county. 我想找到每个县内人口最多的城市。 i'm sorry I don't have a reproducible example.
对不起,我没有可复制的示例。 If I did, I would have the answer!
如果我做到了,我将得到答案!
The short answer is that you should use gContains(...)
in package rgeos
. 简短的答案是,您应该在包
rgeos
使用gContains(...)
。
Here is the long answer. 这是一个很长的答案。
In the code below, we grab a high resolution shapefile of North Carolina counties from the GADM database, and a geocoded dataset of North Carolina cities from from the US Geological Survey database. 在下面的代码中,我们从GADM数据库中获取北卡罗来纳州县的高分辨率shapefile,并从美国地质调查局数据库中获取北卡罗来纳州城市的地理编码数据集。 The latter already has county information but we ignore that.
后者已经有了县的信息,但是我们忽略了这一点。 Then we map cities to their appropriate county using
gContains(...)
, add that information to the cities data frame, and identify the largest city in each county using the data.table package. 然后,我们使用
gContains(...)
将城市映射到相应的县,将信息添加到城市数据框中,并使用data.table包确定每个县中最大的城市。 Most of the work is in 4 lines of code near the end. 大部分工作都在末尾的4行代码中。
library(raster) # for getData(...); you may not need this
library(foreign) # for read.dbf(...); you may not need this
library(rgeos) # for gContains(...); loads package sp as well
setwd("< directory for downloaded data >")
# get North Carolina Counties shapefile from GADM database
USA <- getData("GADM",country="USA",level=2) # level 2 is counties
NC.counties <- USA[USA$NAME_1=="North Carolina",] # North Carolina Counties
# get North Carolina Cities data from USGS database
url <- "http://dds.cr.usgs.gov/pub/data/nationalatlas/citiesx010g_shp_nt00962.tar.gz"
download.file(url,"cities.tar.gz")
untar("cities.tar.gz")
data <- read.dbf("citiesx010g.dbf",as.is=TRUE)
NC.data <- data[data$STATE=="NC",c("NAME","COUNTY","LATITUDE","LONGITUDE","POP_2010")]
## --- evverything up to here is just to set up the example
# convert cities data.frame to SpatialPointsDataFrame
NC.cities <- SpatialPointsDataFrame(NC.data[,c("LONGITUDE","LATITUDE")],
data=NC.data,
proj4string=CRS(proj4string(NC.counties)))
# map cities to counties
city.cnty <- gContains(NC.counties,NC.cities,byid=TRUE)
# add county information to cities data
NC.data$county <- apply(city.cnty,1,function(cnty)ifelse(any(cnty),NC.counties@data[cnty,]$NAME_2,NA))
# identify largest city in each county
library(data.table)
result <- setDT(NC.data)[,.SD[which.max(POP_2010)],by="county"]
head(result)
# county NAME COUNTY LATITUDE LONGITUDE POP_2010
# 1: Jackson Cullowhee Jackson 35.31371 -83.17653 6228
# 2: Graham Robbinsville Graham 35.32287 -83.80740 620
# 3: Wilkes North Wilkesboro Wilkes 36.15847 -81.14758 4245
# 4: Rowan Salisbury Rowan 35.67097 -80.47423 33662
# 5: Buncombe Asheville Buncombe 35.60095 -82.55402 83393
# 6: Wayne Goldsboro Wayne 35.38488 -77.99277 36437
The workhorse here is the line: 这条线是这里的主力:
city.cnty <- gContains(NC.counties,NC.cities,byid=TRUE)
This compares every point in the SpatialPointsDataFrame NC.Cities
to every Polygon in the SpatialPolygonsDataFrame NC.counties
and returns a logical matrix where tthe rows represent cities and the columns represent counties, and the [i,j]
element is TRUE
if city i
is in county j
, FALSE
otherwise. 它将SpatialPointsDataFrame
NC.Cities
中的每个点与SpatialPolygonsDataFrame NC.Cities
中的每个多边形进行NC.counties
并返回一个逻辑矩阵,其中行代表城市,列代表县,如果城市i
位于[i,j]
元素为TRUE
。县j
,否则为FALSE
。 We process the matrix row-wise in the next statement: 我们在下一条语句中逐行处理矩阵:
NC.data$county <- apply(city.cnty,1,function(cnty)ifelse(any(cnty),NC.counties@data[cnty,]$NAME_2,NA))
using each row in succession to index the attributes table for NC.counties
to extract the county name. 使用连续的每一
NC.counties
的属性表建立NC.counties
以提取县名称。
The data you provided in your question has some problems which are nevertheless instructive. 您在问题中提供的数据存在一些问题,这些问题仍然具有启发性。 First, the NC shapefile in the
maptools
package is relatively low resolution. 首先,
maptools
软件包中的NC shapefile分辨率较低。 In particular this means that some of the coastal islands are completely missing, so any city on one of those islands will not map to a county. 特别是,这意味着某些沿海岛屿完全消失了,因此这些岛屿之一上的任何城市都不会映射到一个县。 You might have the same problem with your real data so watch out for it.
您的真实数据可能会遇到同样的问题,因此请当心。
Second, comparing the COUNTY
column in the original USGS dataset with the county
column which we added, there are 3 (out of 865) counties that do not agree. 其次,将原始USGS数据集中的
COUNTY
列与我们添加的county
列进行比较,有3个(共865个)县意见不一致。 It turns out that, in those cases, the USGS database was wrong (or out of date). 事实证明,在这些情况下,USGS数据库是错误的(或已过期)。 You might have the same problem so watch out for that too.
您可能有同样的问题,所以也要当心。
Third, an additional three cities did not map to any county. 第三,另外三个城市没有映射到任何县。 These were all coastal cities and probably reflect small inaccuracies in the North Carolina shapefile.
这些都是沿海城市,可能反映了北卡罗莱纳州shapefile中的小错误。 You night have this problem as well.
你晚上也有这个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.