简体   繁体   English

如何计算距离并返回具有最短距离的特定变量的值?

[英]how to calculate the distance and return the value of a specific variable with the shortest distance?

I have two separated datasets. 我有两个分开的数据集。 One contains the location of the participants, another contains the location of measurement station and corresponding values, at different time points. 一个包含参与者的位置,另一个包含测量站的位置和相应的值,在不同的时间点。 Below I generate sample datasets. 下面我生成样本数据集。

# dataset of value
yearmon <- c("Jan 1996","Jan 1996","Jan 1996","Jan 1996","Jan 1996","Jan 1996",
         "Feb 1996","Feb 1996","Feb 1996","Feb 1996","Feb 1996","Feb 1996",
         "Mar 1996","Mar 1996","Mar 1996","Mar 1996","Mar 1996","Mar 1996",
         "Apr 1996","Apr 1996","Apr 1996","Apr 1996","Apr 1996","Apr 1996",
         "May 1996","May 1996","May 1996","May 1996","May 1996","May 1996",
         "Jun 1996","Jun 1996","Jun 1996","Jun 1996","Jun 1996","Jun 1996")

lon <- c(114.1592, 114.1294, 114.1144, 114.0228, 113.9763, 113.9431)

lat <- c(22.35694, 22.31306, 22.33000, 22.37167, 22.37639, 22.45111)

STN <- c("A","B","C","D","E","F")

value <- runif(n=36, min=10, max=20)

df<- data.frame(STN,lon,lat)
df<- rbind(df,df,df,df,df,df)
df <- cbind(df,yearmon,value)
df$value[df$value < 12] <- NA


# dataset of participant location
id <- c(1,2,3,4)
lon.p <- c(114.3608, 114.1850, 114.1581, 114.1683)
lat.p <- c(22.44500, 22.33000, 22.28528, 22.37167)
participant <- data.frame(id,lon.p,lat.p)
#

sample datasets are as below. 样本数据集如下。 I want to calculate the distance between each station (AF) and each participant (1-4) at each time point (yearmon). 我想计算每个站点(AF)和每个参与者(1-4)在每个时间点(yearmon)之间的距离。 And assign the value of a specific time point to the specific participants. 并将特定时间点的值分配给特定参与者。 I could not assign the participants to a station first, because the location of stations may change at different time points (although it does not change in the sample dataset) 我无法首先将参与者分配到工作站,因为工作站的位置可能会在不同的时间点发生变化(尽管在样本数据集中没有变化)

Ie if participant 1 lives closest to Station A in Jan 1996, then he/she should be assign the value 17.03357. 即如果参与者1在1996年1月离A站最近,那么他/她应该分配值17.03357。

I prefer the great circle distance, maybe calculate using script like this: rdist.earth(location1, location2 ,miles=FALSE, R=6371) 我更喜欢大圆距离,可以使用这样的脚本计算:rdist.earth(location1,location2,miles = FALSE,R = 6371)

   id   lon.p     lat.p Apr 1996 Feb 1996 Jan 1996 Jun 1996 Mar 1996 May 1996
1   1 114.3608 22.44500 
2   2 114.1850 22.33000 
3   3 114.1581 22.28528 
4   4 114.1683 22.37167 

At the end, I think this is what I would like to return. 最后,我认为这是我想要回归的。 (But with the value filled in) (但填写了值)

  id lon.p lat.p Apr 1996 Feb 1996 Jan 1996 Jun 1996 Mar 1996 May 1996 1 1 114.3608 22.44500 2 2 114.1850 22.33000 3 3 114.1581 22.28528 4 4 114.1683 22.37167 

Thank you. 谢谢。

Here's a way to do it in a couple of steps. 这是一个通过几个步骤完成它的方法。 Note that I created a naive_dist function just as a placeholder for the distance metric. 请注意,我创建了一个naive_dist函数,就像距离度量的占位符一样。 The function comes from here . 功能来自这里

naive_dist <- function(long1, lat1, long2, lat2) {
  R <- 6371 # Earth mean radius [km]
  d <- acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R
  return(d) # Distance in km
}

dist_by_id <- by(participant, participant$id, FUN = function(x) 
  #you would use your distance metric here
  naive_dist(long1 = x$lon.p, long2 = df$lon, lat1 = x$lat.p, lat2 = df$lat)
  )

#function to find the min for each yearmon, by id
find_min <- function(id, data, by_data){
  data$dist_column = by_data[[id]]
  by(data, data$yearmon, FUN = function(x) x[which.min(x$dist_column),]$value)
}
#initialize
participant[,4:9] = 0
names(participant)[4:9] = as.character(unique(df$yearmon))
#use a for loop to fill in the values
for(i in 1:4){
 participant[i,4:9] = stack(find_min(id = i, data = df, by_data = dist_by_id))[,1] 
}

participant

  id    lon.p    lat.p Jan 1996 Feb 1996 Mar 1996 Apr 1996 May 1996 Jun 1996
1  1 114.3608 22.44500 17.36620 18.88409 19.53951 19.35646 13.00518 18.45556
2  2 114.1850 22.33000 17.36620 18.88409 19.53951 19.35646 13.00518 18.45556
3  3 114.1581 22.28528 18.57447 13.85192 17.52038       NA 16.14562 18.06435
4  4 114.1683 22.37167 17.36620 18.88409 19.53951 19.35646 13.00518 18.45556

Obviously once you change the distance metric these results may change. 显然,一旦你改变距离度量,这些结果可能会改变。

Alternatively, here's an option that uses dplyr , I would tend to prefer this solution since it might be more performant. 或者,这是一个使用dplyr的选项,我倾向于更喜欢这个解决方案,因为它可能更高性能。

library(dplyr)
df2 <- merge(df, participant, all = T) #merge the df's
#calculate distance
df2$distance <- naive_dist(long1 = df2$lon, lat1 = df2$lat,
                           long2 = df2$lon.p, lat2 = df2$lat.p)


df3 <- df2 %>%
  group_by(yearmon, id) %>%
  filter(distance == min(distance)) %>%
  select(id, yearmon, value)

participant2 <- participant
participant2[,4:9] <- 0
names(participant2)[4:9] <- as.character(unique(df$yearmon))

for(i in 1:4){
  participant2[i,4:9] = c(subset(df3, id == i)$value)
}

participant2

  id    lon.p    lat.p Jan 1996 Feb 1996 Mar 1996 Apr 1996 May 1996 Jun 1996
1  1 114.3608 22.44500 19.53951 18.88409 13.00518 17.36620 18.45556 19.35646
2  2 114.1850 22.33000 19.53951 18.88409 13.00518 17.36620 18.45556 19.35646
3  3 114.1581 22.28528 17.52038 13.85192 16.14562 18.57447 18.06435       NA
4  4 114.1683 22.37167 19.53951 18.88409 13.00518 17.36620 18.45556 19.35646

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM