简体   繁体   中英

Substract geographic distances from previous row by group using dplyr and geosphere

I have a dataframe like this.

df <- data.frame(
  id = c(rep("A", 5), rep("B", 5)),
  date = as.Date(as.Date("2022-6-1"):as.Date("2022-6-10"), origin="1970-01-01"),
  lon = 101:110,
  lat = 1:10
)
> df
   id       date    lon   lat
1   A 2022-06-01 101.01  1.01
2   A 2022-06-02 102.01  2.01
3   A 2022-06-03 103.01  3.01
4   A 2022-06-04 104.01  4.01
5   A 2022-06-05 105.01  5.01
6   B 2022-06-06 106.01  6.01
7   B 2022-06-07 107.01  7.01
8   B 2022-06-08 108.01  8.01
9   B 2022-06-09 109.01  9.01
10  B 2022-06-10 110.01 10.01

What I want to do is to calculate the daily traveled distance for each group A and B, and store them in a new column called dist .

I figured out that using dplyr::lag and geosphere::distGeo will help, so I tried the following code.

df %>%
    group_by(id) %>%
    arrange(date, .by_group = TRUE) %>%
    mutate(dist = distGeo(.[, c(lon, lat)],
                          lag(.[, c(lon, lat)], default = first(.[, c(lon, lat)]))))

but this did not work.

Error in `mutate()`:
! Problem while computing `dist = distGeo(...)`.
ℹ The error occurred in group 1: id = "A".
Caused by error in `vectbl_as_col_location()`:
! Must subset columns with a valid subscript vector.
✖ Can't convert from `j` <double> to <integer> due to loss of precision.

i guess there is some syntax errors in mutate , but how can I solve this?

It is probably best to copy the lon/lat-values of the previous day to a separate column, and then do the calculation rowwise:

library(tidyverse)
library(geosphere)

df <- data.frame(
  id = c(rep("A", 5), rep("B", 5)),
  date = as.Date(as.Date("2022-6-1"):as.Date("2022-6-10"), origin="1970-01-01"),
  lon = 101:110,
  lat = 1:10
)

df %>% group_by(id) %>%
  mutate(across(c(lon, lat), lag, order_by = date, .names = "prev_{.col}")) %>%
  rowwise() %>%
  mutate(dist = distGeo(c(lon, lat), c(prev_lon, prev_lat))) %>%
  ungroup()
#> # A tibble: 10 × 7
#>    id    date         lon   lat prev_lon prev_lat    dist
#>    <chr> <date>     <int> <int>    <int>    <int>   <dbl>
#>  1 A     2022-06-01   101     1       NA       NA     NA 
#>  2 A     2022-06-02   102     2      101        1 156876.
#>  3 A     2022-06-03   103     3      102        2 156829.
#>  4 A     2022-06-04   104     4      103        3 156759.
#>  5 A     2022-06-05   105     5      104        4 156666.
#>  6 B     2022-06-06   106     6       NA       NA     NA 
#>  7 B     2022-06-07   107     7      106        6 156409.
#>  8 B     2022-06-08   108     8      107        7 156246.
#>  9 B     2022-06-09   109     9      108        8 156060.
#> 10 B     2022-06-10   110    10      109        9 155851.

Created on 2022-06-15 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM