简体   繁体   中英

Distance between simultaneous point locations in r

I am calculating the distance (in meters) between "simultaneously" recorded UTM locations, but Im having a problem. The way its written now I'm only calculating the distance between only 1 individual that is "closest in time". I want it to calculate the distance between ALL individuals that are "close" in time.

In my example I have 3 moose individuals and 3 wolves. I want to take moose 1 and calculate the distance between the simultaneously recorded locations of wolf 1 then wolf 2 then wolf 3. Right now the script only searches for the absolute minimum time difference between any wolf and calculates the distance for that 1 wolf instead of all others.

Here's my testing data:

Moose location data:

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("F07001", 
"F07010", "M07012"), class = "factor"), x = c(1482445L, 1481274L, 
1481279L, 1481271L, 1480849L, 1480881L, 1480883L, 1480880L, 1482448L, 
1482494L, 1482534L, 1482534L, 1482553L, 1482555L, 1482414L, 1482852L, 
1476120L, 1476104L, 1476101L), y = c(6621768L, 6619628L, 6619630L, 
6619700L, 6620321L, 6620427L, 6620438L, 6620423L, 6616403L, 6616408L, 
6616395L, 6616408L, 6616406L, 6616418L, 6616755L, 6616312L, 6623655L, 
6623646L, 6623652L), date = structure(c(1173088800, 1173096000, 
1173103260, 1173110400, 1173117600, 1173211200, 1173218400, 1173139200, 
1173088800, 1173096000, 1173103260, 1173110400, 1173117600, 1173211200, 
1173218400, 1173139200, 1173270600, 1173277800, 1173282960), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("id", "x", "y", "date"
), row.names = c(NA, -19L), class = "data.frame")

Wolf location data:

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("HF7572", 
"Htest", "UM1347"), class = "factor"), x = c(1480610L, 1480640L, 
1480613L, 1480613L, 1480555L, 1480567L, 1480627L, 1480532L, 1480593L, 
1484394L, 1484394L, 1483940L, 1483933L, 1483935L, 1483930L, 1483855L, 
1483793L, 1483802L, 1484392L, 1483855L), y = c(6619853L, 6619739L, 
6619759L, 6619862L, 6619838L, 6619772L, 6619902L, 6619899L, 6619887L, 
6619589L, 6619602L, 6619899L, 6619907L, 6619905L, 6619896L, 6619834L, 
6619702L, 6619672L, 6619558L, 6619834L), date = structure(c(1173088800, 
1173096060, 1173103440, 1173111600, 1173117780, 1173213600, 1173218400, 
1173141120, 1173266100, 1173095940, 1173099600, 1173103200, 1173106920, 
1173110400, 1173208800, 1173211200, 1173222000, 1173266100, 1173362100, 
1173211200), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("id", 
"x", "y", "date"), row.names = c(NA, -20L), class = "data.frame")

Here's my script so far:

mloc=read.csv("moose.csv", head = T)
wloc=read.csv("wolf.csv", head=T)
mloc$date<-as.POSIXct(strptime(mloc$date,"%Y-%m-%d %H:%M"),tz="UTC")
wloc$date<-as.POSIXct(strptime(wloc$date,"%Y-%m-%d %H:%M"),tz="UTC")

#sort the data sequentially by date time then convert to number
Sortmoose = mloc[order(mloc$date),]
Sortwolf = wloc[order(wloc$date),]
m <- as.numeric(Sortmoose$date)
w <- as.numeric(Sortwolf$date)

#Creates index of the time intervals
id <- findInterval(m, w, all.inside=TRUE)
id_min <- ifelse(abs(m-w[id])<abs(m-w[id+1]), id, id+1)
Sortmoose$wolfID = Sortwolf$id[id_min]
Sortmoose$wolfdate =Sortwolf$date[id_min]
Sortmoose$wolfx = Sortwolf$x[id_min]
Sortmoose$wolfy = Sortwolf$y[id_min]
Sortmoose$dist= sqrt((Sortmoose$wolfx-Sortmoose$x)^2+(Sortmoose$wolfy-Sortmoose$y)^2)

I would like to calculate the distance between every moose/wolf pair as long as the location was recorded at the "same" time. I would like the output to have the moose information and the associated wolf information and the distance (in meters) between those two points. I would also like the time difference so I can filter out those that are >45 minutes or something like that but this is something I think I can do later. Basically something like: mooseID mooseDate mooseX mooseY wolfID wolfDate wolfX wolfY Distance(m) TimeDiff (min)

New solution . Here is the code that does what you want (approximate matching). The key idea is to create a new data table with a new column date1 such that for each date = 05:17:13 in the original data it will have date1 = 04:00:00 , 05:00:00 and 06:00:00 (and all other columns duplicated) and then to do the merging against this new column. That would guarantee that every two events within one hour of each other in the original data will be merged.

After that we just calculate the distance and time difference.

Please note that using data.table is critical for speed since your data frames are so large - using the regular data.frame will be way too slow.

library(data.table)
library(lubridate)

mloc <- data.table(mloc)
wloc <- data.table(wloc)

# Returns a new data table with one new column (date1) and length(range)
# rows for each row in the initial data table, duplicating all other fields.
# Example: for row with date = '2013-01-15 05:17:23' and for the default range
# argument it will add rows with date1 = '2013-01-15 04:00:00', '2013-01-15 05:00:00'
# and '2013-01-15 06:00:00'
AddTimeBoundaries <- function(dt, range = -1:1) {
  dt1 <- rbindlist(lapply(range, 
             function(x) data.table(id = dt$id, date = dt$date, 
                        date1 = floor_date(dt$date, 'hour') +
                        hours(x))))
  setkey(dt1, id, date)
  setkey(dt, id, date)
  result <- dt[dt1]
  setkey(result, date1)
  result
}

mloc.1 <- AddTimeBoundaries(mloc)
wloc.1 <- AddTimeBoundaries(wloc)

x <- mloc.1[wloc.1, allow.cartesian = TRUE][!is.na(id)]
result <- unique(x[, list(id, date, x, y, id.1, date.1, x.1, y.1, 
              distance = sqrt((x-x.1)^2 + (y-y.1)^2),
              time.diff = date - date.1)])

Result has all the events within 1 hour (and sometimes within 2 hours but you can easily filter those events out).

> head(result, 10)
        id                date       x       y   id.1              date.1     x.1     y.1  distance  time.diff
1: F07001 2007-03-05 10:00:00 1482445 6621768 HF7572 2007-03-05 10:00:00 1480610 6619853 2652.2538     0 secs
2: M07012 2007-03-05 10:00:00 1482448 6616403 HF7572 2007-03-05 10:00:00 1480610 6619853 3909.0592     0 secs
3: F07001 2007-03-05 10:00:00 1482445 6621768 UM1347 2007-03-05 11:59:00 1484394 6619589 2923.4640 -7140 secs
4: M07012 2007-03-05 10:00:00 1482448 6616403 UM1347 2007-03-05 11:59:00 1484394 6619589 3733.2977 -7140 secs
5: F07001 2007-03-05 12:00:00 1481274 6619628 HF7572 2007-03-05 10:00:00 1480610 6619853  701.0856  7200 secs
6: M07012 2007-03-05 12:00:00 1482494 6616408 HF7572 2007-03-05 10:00:00 1480610 6619853 3926.5100  7200 secs
7: F07001 2007-03-05 10:00:00 1482445 6621768 HF7572 2007-03-05 12:01:00 1480640 6619739 2715.6705 -7260 secs
8: F07001 2007-03-05 12:00:00 1481274 6619628 HF7572 2007-03-05 12:01:00 1480640 6619739  643.6435   -60 secs
9: M07012 2007-03-05 10:00:00 1482448 6616403 HF7572 2007-03-05 12:01:00 1480640 6619739 3794.4380 -7260 secs
10: M07012 2007-03-05 12:00:00 1482494 6616408 HF7572 2007-03-05 12:01:00 1480640 6619739 3812.2011   -60 secs

Old solution . This doesn't work as the OP requires an approximate matching of the dates (within 1 hour), not exact.

Assuming I interpreted your question correctly, here is the solution using data.table package. I called the first structure in your testing data mloc and the second one wloc .

Step 1. Convert both data frames to data.table and set key on date :

library(data.table)
mloc <- data.table(mloc)
wloc <- data.table(wloc)
setkey(mloc, date)
setkey(wloc, date)

Step 2. Merge two tables by the date key, creating a "cartesian product" and calculating the distance:

x <- mloc[wloc, allow.cartesian = TRUE][!is.na(id)]
x[, distance := sqrt((x-x.1)^2 + (y-y.1)^2)]

> x
                   date     id       x       y   id.1     x.1     y.1  distance
 1: 2007-03-05 10:00:00 F07001 1482445 6621768 HF7572 1480610 6619853 2652.2538
 2: 2007-03-05 10:00:00 M07012 1482448 6616403 HF7572 1480610 6619853 3909.0592
 3: 2007-03-05 16:00:00 F07001 1481271 6619700 UM1347 1483935 6619905 2671.8759
 4: 2007-03-05 16:00:00 M07012 1482534 6616408 UM1347 1483935 6619905 3767.2019
 5: 2007-03-06 20:00:00 F07001 1480881 6620427 UM1347 1483855 6619834 3032.5443
 6: 2007-03-06 20:00:00 M07012 1482555 6616418 UM1347 1483855 6619834 3655.0042
 7: 2007-03-06 20:00:00 F07001 1480881 6620427  Htest 1483855 6619834 3032.5443
 8: 2007-03-06 20:00:00 M07012 1482555 6616418  Htest 1483855 6619834 3655.0042
 9: 2007-03-06 22:00:00 F07001 1480883 6620438 HF7572 1480627 6619902  593.9966
10: 2007-03-06 22:00:00 M07012 1482414 6616755 HF7572 1480627 6619902 3618.9747

I think I have a partial solution, and it will let you modify the "closeness" window as desired.

# Convert to data.table:
mloc<-as.data.table(mloc)
wloc<-as.data.table(wloc)

# Rename columns to make them less ambiguous:
setnames(mloc,paste0("m",names(mloc)))
setnames(wloc,paste0("w",names(wloc)))

# Adjustable rounding factor:
r <- 45 /60/24 # Need to convert minutes to days

# Add the rounded date column to the two tables:
mloc[,rdate:=round(as.numeric(mdate-as.POSIXct("1970-01-01", tz="GMT"))/r)*r*60*60*24+as.POSIXct("1970-01-01", tz="GMT")]
wloc[,rdate:=round(as.numeric(wdate-as.POSIXct("1970-01-01", tz="GMT"))/r)*r*60*60*24+as.POSIXct("1970-01-01", tz="GMT")]

# Set the keys:
setkey(mloc,rdate)
setkey(wloc,rdate)

# Join the wolf and moose tables on the rounded date:
wloc[mloc, allow.cartesian=T,nomatch=0]

##                  rdate    wid      wx      wy               wdate    mid      mx      my               mdate
## 1: 2007-03-05 09:45:00 HF7572 1480610 6619853 2007-03-05 10:00:00 F07001 1482445 6621768 2007-03-05 10:00:00
## 2: 2007-03-05 09:45:00 HF7572 1480610 6619853 2007-03-05 10:00:00 M07012 1482448 6616403 2007-03-05 10:00:00
## 3: 2007-03-05 12:00:00 UM1347 1484394 6619589 2007-03-05 11:59:00 F07001 1481274 6619628 2007-03-05 12:00:00
## 4: 2007-03-05 12:00:00 HF7572 1480640 6619739 2007-03-05 12:01:00 F07001 1481274 6619628 2007-03-05 12:00:00
## 5: 2007-03-05 12:00:00 UM1347 1484394 6619589 2007-03-05 11:59:00 M07012 1482494 6616408 2007-03-05 12:00:00
## 6: 2007-03-05 12:00:00 HF7572 1480640 6619739 2007-03-05 12:01:00 M07012 1482494 6616408 2007-03-05 12:00:00
## 7: 2007-03-05 14:15:00 UM1347 1483940 6619899 2007-03-05 14:00:00 F07001 1481279 6619630 2007-03-05 14:01:00
## 8: 2007-03-05 14:15:00 HF7572 1480613 6619759 2007-03-05 14:04:00 F07001 1481279 6619630 2007-03-05 14:01:00
## 9: 2007-03-05 14:15:00 UM1347 1483940 6619899 2007-03-05 14:00:00 M07012 1482534 6616395 2007-03-05 14:01:00
##10: 2007-03-05 14:15:00 HF7572 1480613 6619759 2007-03-05 14:04:00 M07012 1482534 6616395 2007-03-05 14:01:00
##11: 2007-03-05 15:45:00 UM1347 1483935 6619905 2007-03-05 16:00:00 F07001 1481271 6619700 2007-03-05 16:00:00
##12: 2007-03-05 15:45:00 UM1347 1483935 6619905 2007-03-05 16:00:00 M07012 1482534 6616408 2007-03-05 16:00:00
##13: 2007-03-05 18:00:00 HF7572 1480555 6619838 2007-03-05 18:03:00 F07001 1480849 6620321 2007-03-05 18:00:00
##14: 2007-03-05 18:00:00 HF7572 1480555 6619838 2007-03-05 18:03:00 M07012 1482553 6616406 2007-03-05 18:00:00
##15: 2007-03-06 20:15:00 UM1347 1483855 6619834 2007-03-06 20:00:00 F07001 1480881 6620427 2007-03-06 20:00:00
##16: 2007-03-06 20:15:00  Htest 1483855 6619834 2007-03-06 20:00:00 F07001 1480881 6620427 2007-03-06 20:00:00
##17: 2007-03-06 20:15:00 UM1347 1483855 6619834 2007-03-06 20:00:00 M07012 1482555 6616418 2007-03-06 20:00:00
##18: 2007-03-06 20:15:00  Htest 1483855 6619834 2007-03-06 20:00:00 M07012 1482555 6616418 2007-03-06 20:00:00
##19: 2007-03-06 21:45:00 HF7572 1480627 6619902 2007-03-06 22:00:00 F07001 1480883 6620438 2007-03-06 22:00:00
##20: 2007-03-06 21:45:00 HF7572 1480627 6619902 2007-03-06 22:00:00 M07012 1482414 6616755 2007-03-06 22:00:00

I said this was partial, since it will miss close matches when one value is rounded up and the other down. For instance, wdate of 2007-03-05 16:20:00 is rounded up to 2007-03-05 16:30:00 and mdate of 2007-03-05 16:00:00 is rounded down to 2007-03-05 15:45:00 , so there is no match in the join even though these two events are only 20min apart and the window is 45min.

I have another partial data.table solution that does not round but instead uses roll=-45*60 and roll=45*60 (two results that are then rbindlist ed together). It picks up this example record, but looks to have some other issues that I need to investigate...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM