简体   繁体   中英

How to transform a nested for-loop operation to a more efficient code in R

I am a dilettante when it comes to R coding. I am trying to run the following code for one of the tasks. My basic purpose is to count the number of attractions within the proximity of 2kms of a specific location, both attractions, and the locations are specified by respective longitude and latitude. The number of records in the main data set is around 29K and while the number of attractions is 28. How can I convert the following code in a better performing R code instead (the current one is really crude and not at all a good practice)

for(i in 1:nrow(mainData)) {
  attr_count[i] = 0  
  loc_coord = c(mainData$longitude[i],mainData$latitude[i])
  for(j in 1:nrow(ny_attractions)) {
    attr_coord = c(ny_attractions$lon[j],ny_attractions$lat[j])
    dist = distVincentySphere(attr_coord,loc_coord)
    if(dist <= 2000) {
      attr_count[i] = attr_count[i] + 1
    } 
  }
}

[EDIT]: My apologies for not putting it clearly earlier. Here's an example of what I am trying to achieve. I have 2 data sets -

Dataset - 1 (NYC_attractions) (27 records)

在此处输入图片说明

Dataset-2 (master data for house listings) (29K records)

在此处输入图片说明

Now, I need to add one more column (num_of_attractions) in Dataset-2, representing the number of attractions within 2Kms of the specified listing (ie per record in data set-2)

Hope, this explains the problem

Thanks

Hello your question is partly answered here https://stackoverflow.com/a/49860968/3042154 . As you use geodetic coordinates (lat/lon) instead of projected coordinates (meters) it can be done in to steps. First roughly select potential neighbours using euclidian distance using given answer then refine the selection by using your distance

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM