简体   繁体   English

求R中数据点一定半径内的点数

[英]Find the Number of Points Within a Certain Radius of a Data Point in R

I have 2 datasets one for hospitals and another for procedures.我有 2 个数据集,一个用于医院,另一个用于程序。 Each dataset has latitude and longitude coordinates.每个数据集都有纬度和经度坐标。 Procedures are either given in or out of the hospital, though the coordinates are not necessarily precise if given in the hospitals.程序要么在医院内进行,要么在医院外进行,但如果在医院提供,坐标不一定精确。 Im trying to form a radius of a certain size around each of the hospitals and determine how many procedure points fall within that radius on average.我试图在每个医院周围形成一定大小的半径,并确定平均有多少手术点落在该半径内。 So if for example I have 100 hospitals and 3000 procedures, I want to form a radius around all hospitals and see on average how many hospitals fall into that specified radius.因此,例如,如果我有 100 家医院和 3000 个程序,我想在所有医院周围形成一个半径,然后查看平均有多少家医院落入该指定半径内。 My initial code is below, but I know this can be done faster.我的初始代码如下,但我知道这可以更快地完成。 coded in R.编码为 R。 Thanks!谢谢!

for(i in 1:NROW(hospitals)){
  hospital <- hospitals[i,]
  radius <- .016

  # find all the procedures that lie in the .016 sized radius from this hospital

  hospital$latitude_low <- hospital$lat - radius
  hospital$longitude_low <- hospital$long - radius
  hospital$latitude_high <- hospital$lat + radius
  hospital$longitude_high <- hospital$long + radius

  in_rad <- procedures[(procedures$long >= hospital$longitude_low & procedures$long <= 
  hospital$longitude_high & procedures$lat <= hospital$latitude_high & procedures$lat >= 
  hospital$latitude_low),]

  num <- NROW(in_rad)
  hospitals[i,]$number_of_procedures <- num
}

When you ask a question, you should always include some example data.当您提出问题时,您应该始终包含一些示例数据。 Like this像这样

lat <- c(-23.8, -25.8)
lon <- c(-49.6, -44.6)
hosp <- cbind(lon, lat)


lat <- c(-22.8, -24.8, -29.1, -28, -20)
lon <- c(-46.4, -46.3, -45.3, -40, -30)
procedures <- cbind(lon, lat)

Are your data in longitude/latitude?您的数据是经度/纬度吗? If so, you need to use a proper method to compute distances.如果是这样,您需要使用适当的方法来计算距离。 For example例如

 library(geosphere)
 dm <- distm(procedures, hosp)

Or或者

 library(raster)
 d <- pointDistance(procedures, hosp, lonlat=TRUE)

Both compute the distance from all procedures to all hospitals.两者都计算从所有程序到所有医院的距离。 This will fail with very large datasets, but from what you describe it should work fine.对于非常大的数据集,这将失败,但根据您的描述,它应该可以正常工作。 Now you can use a threshold (here 400,000 m) to find out which procedures are within that distance of each hospital现在您可以使用一个阈值(此处为 400,000 m)来找出哪些程序在每个医院的该距离内

apply(d < 400000, 2, which)
#[[1]]
#[1] 1 2

#[[2]]
#[1] 1 2 3

So procedure 1, 2 and 3 are within that distance to hospital 2所以程序 1、2 和 3 都在到医院 2 的距离之内

If your data are not longitude/latitude, you can use如果您的数据不是经度/纬度,您可以使用

 d <- pointDistance(procedures, hosp, lonlat=FALSE)

There are a couple of things that could be improved here.这里有几件事可以改进。 Firstly, you are not actually calculating procedures done within a radius of 0.16 units from the hospital, but procedures done within a 0.32 * 0.32 units square with the hospital at its center.首先,您实际上不是在计算距离医院 0.16 个单位半径内完成的程序,而是在以医院为中心的 0.32 * 0.32 个单位正方形内完成的程序。 Probably not a huge deal for the specific problem, but its actually quicker to work out points within a particular distance, as you actually intended.对于特定问题来说可能不是什么大问题,但实际上它可以更快地计算出特定距离内的点,正如您实际想要的那样。

Secondly, you have have a tendency to store any variables you have calculated even if you're only going to use them once.其次,即使您只打算使用一次,您也倾向于存储您计算的任何变量。 This can help with understanding the code, but is sometimes less efficient and certainly makes your code longer, particularly if you like using long_descriptive_variable_names .这有助于理解代码,但有时效率较低,并且肯定会使您的代码更长,特别是如果您喜欢使用long_descriptive_variable_names

Thirdly, at the end, you subset procedures and then measure the number of rows, rather than just using the length of the subset itself.第三,最后,您对过程进行子集procedures ,然后测量行数,而不仅仅是使用子集本身的长度。

Lastly (but less importantly), you write the result one value at a time into a new column.最后(但不太重要),您将结果一次写入一个值到一个新列中。 You can do this all in one gulp using sapply instead.您可以使用sapply在一个 gulp 中完成这一切。

So your code could be replaced with something much simpler, like:所以你的代码可以用更简单的东西代替,比如:

hospitals$number_of_procedures <- sapply(1:NROW(hospitals), function(i)
  {
    d <- (procedures$long - hospitals[i,]$long)^2 + (procedures$lat - hospitals[i,]$lat)^2
    length(which(d < 0.16^2))
  })

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:如何在轨迹的一定半径内找到空间点并确保它们是连续的 - R: How to find spatial points within a certain radius of a trajectory and to ensure that they are consecutive 具有大数据集的半径内的点数-R - Number of points within a radius with large datasets- R 在距R中已知中心点的半径内找到纬度/经度 - Find lat/lon within a radius from a known centered point in R 查找哪些地理编码点位于特定点的半径 25 英里的圆内 - Find which Geocoded points lie within a circle of radius 25 miles of a specific point 将R中给定半径内的点合并为一个质心 - Combine points within given radius in R to a centroid R - 在给定半径内寻找最近的相邻点和相邻点的数量,坐标经纬度 - R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long 对于每个数据框行,找到在一定范围内的点 - for each data frame row, find points lying within a certain range R 中的 X 和 Y 坐标。是否有明显的方法可以删除距离给定点一定半径的数据? - X and Y coordinates in R. Is there an obvious way to remove data that is a certain radius away from a given point? 在 R 中创建 Leaflet 映射; 将结果提取到定义半径内的点的数据帧 - Creating Leaflet maps in R; extracting results to a data frame of points that fall within a defined radius 使用 ggplot2 在一定半径内绘制点之间的路径 - Draw path between points within certain radius using ggplot2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM