求R中数据点一定半径内的点数

Question

I have 2 datasets one for hospitals and another for procedures.我有 2 个数据集，一个用于医院，另一个用于程序。 Each dataset has latitude and longitude coordinates.每个数据集都有纬度和经度坐标。 Procedures are either given in or out of the hospital, though the coordinates are not necessarily precise if given in the hospitals.程序要么在医院内进行，要么在医院外进行，但如果在医院提供，坐标不一定精确。 Im trying to form a radius of a certain size around each of the hospitals and determine how many procedure points fall within that radius on average.我试图在每个医院周围形成一定大小的半径，并确定平均有多少手术点落在该半径内。 So if for example I have 100 hospitals and 3000 procedures, I want to form a radius around all hospitals and see on average how many hospitals fall into that specified radius.因此，例如，如果我有 100 家医院和 3000 个程序，我想在所有医院周围形成一个半径，然后查看平均有多少家医院落入该指定半径内。 My initial code is below, but I know this can be done faster.我的初始代码如下，但我知道这可以更快地完成。 coded in R.编码为 R。 Thanks!谢谢！

for(i in 1:NROW(hospitals)){
  hospital <- hospitals[i,]
  radius <- .016

  # find all the procedures that lie in the .016 sized radius from this hospital

  hospital$latitude_low <- hospital$lat - radius
  hospital$longitude_low <- hospital$long - radius
  hospital$latitude_high <- hospital$lat + radius
  hospital$longitude_high <- hospital$long + radius

  in_rad <- procedures[(procedures$long >= hospital$longitude_low & procedures$long <= 
  hospital$longitude_high & procedures$lat <= hospital$latitude_high & procedures$lat >= 
  hospital$latitude_low),]

  num <- NROW(in_rad)
  hospitals[i,]$number_of_procedures <- num
}

Answer 1

When you ask a question, you should always include some example data.当您提出问题时，您应该始终包含一些示例数据。 Like this像这样

lat <- c(-23.8, -25.8)
lon <- c(-49.6, -44.6)
hosp <- cbind(lon, lat)


lat <- c(-22.8, -24.8, -29.1, -28, -20)
lon <- c(-46.4, -46.3, -45.3, -40, -30)
procedures <- cbind(lon, lat)

Are your data in longitude/latitude?您的数据是经度/纬度吗？ If so, you need to use a proper method to compute distances.如果是这样，您需要使用适当的方法来计算距离。 For example例如

 library(geosphere)
 dm <- distm(procedures, hosp)

Or或者

 library(raster)
 d <- pointDistance(procedures, hosp, lonlat=TRUE)

Both compute the distance from all procedures to all hospitals.两者都计算从所有程序到所有医院的距离。 This will fail with very large datasets, but from what you describe it should work fine.对于非常大的数据集，这将失败，但根据您的描述，它应该可以正常工作。 Now you can use a threshold (here 400,000 m) to find out which procedures are within that distance of each hospital现在您可以使用一个阈值（此处为 400,000 m）来找出哪些程序在每个医院的该距离内

apply(d < 400000, 2, which)
#[[1]]
#[1] 1 2

#[[2]]
#[1] 1 2 3

So procedure 1, 2 and 3 are within that distance to hospital 2所以程序 1、2 和 3 都在到医院 2 的距离之内

If your data are not longitude/latitude, you can use如果您的数据不是经度/纬度，您可以使用

 d <- pointDistance(procedures, hosp, lonlat=FALSE)

Answer 2

There are a couple of things that could be improved here.这里有几件事可以改进。 Firstly, you are not actually calculating procedures done within a radius of 0.16 units from the hospital, but procedures done within a 0.32 * 0.32 units square with the hospital at its center.首先，您实际上不是在计算距离医院 0.16 个单位半径内完成的程序，而是在以医院为中心的 0.32 * 0.32 个单位正方形内完成的程序。 Probably not a huge deal for the specific problem, but its actually quicker to work out points within a particular distance, as you actually intended.对于特定问题来说可能不是什么大问题，但实际上它可以更快地计算出特定距离内的点，正如您实际想要的那样。

Secondly, you have have a tendency to store any variables you have calculated even if you're only going to use them once.其次，即使您只打算使用一次，您也倾向于存储您计算的任何变量。 This can help with understanding the code, but is sometimes less efficient and certainly makes your code longer, particularly if you like using long_descriptive_variable_names .这有助于理解代码，但有时效率较低，并且肯定会使您的代码更长，特别是如果您喜欢使用long_descriptive_variable_names 。

Thirdly, at the end, you subset procedures and then measure the number of rows, rather than just using the length of the subset itself.第三，最后，您对过程进行子集procedures ，然后测量行数，而不仅仅是使用子集本身的长度。

Lastly (but less importantly), you write the result one value at a time into a new column.最后（但不太重要），您将结果一次写入一个值到一个新列中。 You can do this all in one gulp using sapply instead.您可以使用sapply在一个 gulp 中完成这一切。

So your code could be replaced with something much simpler, like:所以你的代码可以用更简单的东西代替，比如：

hospitals$number_of_procedures <- sapply(1:NROW(hospitals), function(i)
  {
    d <- (procedures$long - hospitals[i,]$long)^2 + (procedures$lat - hospitals[i,]$lat)^2
    length(which(d < 0.16^2))
  })

求R中数据点一定半径内的点数

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-06-05 00:49:14

解决方案2
1 2020-06-04 20:49:58

求R中数据点一定半径内的点数

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-06-05 00:49:14

解决方案2 1 2020-06-04 20:49:58

解决方案1
2 已采纳 2020-06-05 00:49:14

解决方案2
1 2020-06-04 20:49:58