Iterative euclidean distance calculation between consecutive points (x,y tuples) which belongs to a list of lines

Question

I have a dataframe which contains Lines, PointID, X and Y coordinates; each line contains a group of points with X,Y coordinates:

LINE    Point ID    X coordinate    Y Coordinate
 A         1             1               2
 A         2             2               2
 A         3             3               2
 B         1             11              3
 B         2             12              3
 B         3             13              3

Trying to calculate the euclidean distance between consecutive points within a line to obtain as a result the following:

LINE    Point ID    X coordinate    Y Coordinate    Euclidean Dist.
  A         1             1              2    
  A         2             2              2                1 (dist between Point ID's 1 and 2 for line A)
  A         3             3              2                1 (dist between Point ID's 2 and 3 for line A)
  B         1            11              3  
  B         2            12              3                1 (dist between Point ID's 1 and 2 for line B)
  B         3            13              3                1 (dist between Point ID's 2 and 3 for line B)

My Attemp was to create a DataFrame, use groupby to group the lines 'LINE' and then calculate the euclidean distance between consecutive points within a line by using scipy:

predist = df.groupby(['LINE']).apply(lambda x: x)

dist = pdist(predist[['X', 'Y']], 'euclidean')

I'm definitely doing something wrong, as the results I'm obtaining are cumulative distances between the first point of a line with each consecutive point within a line, instead of receiving the distances between each individual segment created by consecutive points (tuple of coordinates).

Answer 1

You could use shift() to find the X and Y coordinates of the previous point for every point in LINE . Then calculate distances between this point and previous point:

import pandas as pd
import numpy as np

data = """
LINE    PointID          X               Y
 A         1             1               2
 A         2             2               2
 A         3             3               2
 B         1             11              3
 B         2             12              3
 B         3             13              3"""
df = pd.read_csv(StringIO(data),sep="\s+")

dx = (df['X'] - df.groupby('LINE')['X'].shift())
dy = (df['Y'] - df.groupby('LINE')['Y'].shift())
df['dist'] = np.sqrt(dx**2 + dy**2)

This produces the expected distances:

  LINE  PointID   X  Y  dist
0    A        1   1  2   NaN
1    A        2   2  2   1.0
2    A        3   3  2   1.0
3    B        1  11  3   NaN
4    B        2  12  3   1.0
5    B        3  13  3   1.0

NaN values can be filled in a way that fits your usecase.

Iterative euclidean distance calculation between consecutive points (x,y tuples) which belongs to a list of lines

Question

1 answers

solution1
2 ACCPTED 2017-04-09 11:05:08

Iterative euclidean distance calculation between consecutive points (x,y tuples) which belongs to a list of lines

Question

1 answers

solution1 2 ACCPTED 2017-04-09 11:05:08

solution1
2 ACCPTED 2017-04-09 11:05:08