简体   繁体   English

计算行与R中所有先前行之间的最小距离

[英]Computing minimum distance between a row and all previous rows in R

I want to compute the minimum distance between the current row and every row before it within each group. 我想计算当前行与每个行之前每行之间的最小距离。 My data frame has several groups, and each group has multiple dates with longitude and latitude. 我的数据框有几组,每组有多个经度和纬度的日期。 I use a Haversine function to compute distance, and I need to apply this function as described above. 我使用Haversine函数计算距离,并且需要如上所述应用此函数。 The data frame looks like the following: 数据框如下所示:

  grp    date    long lat rowid
1   1 1995-07-01   11  12     1
2   1 1995-07-05    3   0     2
3   1 1995-07-09   13   4     3
4   1 1995-07-13    4  25     4
5   2 1995-03-07   12   6     1
6   2 1995-03-10    3  27     2
7   2 1995-03-13   34   8     3
8   2 1995-03-16   25   9     4

My current attempt uses purrrlyr::by_row, but the method is too slow. 我当前的尝试使用purrrlyr :: by_row,但是该方法太慢。 In practice, each group has thousands of dates and geographic positions. 实际上,每个组都有数千个日期和地理位置。 Here is part of my current attempt: 这是我当前尝试的一部分:

calc_min_distance <- function(df, grp.name, row){
  df %>% 
    filter(
      group_name==grp.name
    ) %>% 
    filter(
      row_number() <= row
    ) %>% 
    mutate(
      last.lat = last(lat),
      last.long = last(long),
      rowid = 1:n()
    ) %>% 
    group_by(rowid) %>% 
    purrrlyr::by_row(
      ~haversinedistance.fnct(.$last.long, .$last.lat, .$long, .$lat),
      .collate='rows',
      .to = 'min.distance'
    ) %>% 
    filter(
      row_number() < n()
    ) %>% 
    summarise(
      min = min(min.distance)
    ) %>% 
    .$min
}

df_dist <-
  df %>% 
  group_by(grp_name) %>% 
  mutate(rowid = 1:n()) %>% 
  group_by(grp_name, rowid) %>% 
  purrrlyr::by_row(
    ~calc_min_distance(df, .$grp_name,.$rowid),
    .collate='rows',
    .to = 'min.distance'
  ) %>% 
  ungroup %>% 
  select(-rowid)

Suppose that distance is defined as (lat + long) for reference row - (lat + long) for each pairwise row less than the reference row. 假设参考行的距离定义为(lat + long)-每对小于参考行的成对行的距离(lat + long)。 My expected output for grp 1 is the following: 我对grp 1的预期输出如下:

  grp       date long lat rowid min.distance
1   1 1995-07-01   11  12     1            0
2   1 1995-07-05    3   0     2          -20
3   1 1995-07-09   13   4     3           -6
4   1 1995-07-13    4  25     4            6

How can I quickly compute the minimum distance between the current rowid and all rowids before it? 如何快速计算当前rowid与之前所有rowid之间的最小距离?

Here's how I would go about it. 这就是我要做的。 You need to calculate all the within-group pair-wise distances anyway, so we'll use geosphere::distm which is designed to do just that. 无论如何,您都需要计算组内所有成对的距离,因此我们将使用geosphere::distm专门用于此目的。 I'd suggest stepping through my function line-by-line and looking at what it does, I think it will make sense. 我建议逐行逐步完成我的功能,并查看它的作用,我认为这是有道理的。

library(geosphere)
find_min_dist_above = function(long, lat, fun = distHaversine) {
  d = distm(x = cbind(long, lat), fun = fun)
  d[lower.tri(d, diag = TRUE)] = NA
  d[1, 1] = 0
  return(apply(d, MAR = 2, min, na.rm = TRUE))
}

df %>% group_by(grp) %>%
  mutate(min.distance = find_min_dist_above(long, lat))
# # A tibble: 8 x 6
# # Groups:   grp [2]
#     grp date        long   lat rowid min.distance
#   <int> <fct>      <int> <int> <int>        <dbl>
# 1     1 1995-07-01    11    12     1           0 
# 2     1 1995-07-05     3     0     2     1601842.
# 3     1 1995-07-09    13     4     3      917395.
# 4     1 1995-07-13     4    25     4     1623922.
# 5     2 1995-03-07    12     6     1           0 
# 6     2 1995-03-10     3    27     2     2524759.
# 7     2 1995-03-13    34     8     3     2440596.
# 8     2 1995-03-16    25     9     4      997069.

Using this data: 使用此数据:

df = read.table(text = '  grp    date    long lat rowid
1   1 1995-07-01   11  12     1
2   1 1995-07-05    3   0     2
3   1 1995-07-09   13   4     3
4   1 1995-07-13    4  25     4
5   2 1995-03-07   12   6     1
6   2 1995-03-10    3  27     2
7   2 1995-03-13   34   8     3
8   2 1995-03-16   25   9     4', h = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM