How to apply a defined function on pandas data frame

Question

I have a following function defined, which is working on 2d arrays. The angle function is calculating the angle between vectors.

While calling the function below, its taking in "directions" as the parameter, which is a 2d array (with 2 cols one with x vals and another with y vals).

Now directions was obtained by applying np.diff() function 2d array.

import matplotlib.pyplot as plt
import numpy as np
import os
import rdp

def angle(dir):
    """
    Returns the angles between vectors.

    Parameters:
    dir is a 2D-array of shape (N,M) representing N vectors in M-dimensional space.

    The return value is a 1D-array of values of shape (N-1,), with each value between 0 and pi.

    0 implies the vectors point in the same direction
    pi/2 implies the vectors are orthogonal
    pi implies the vectors point in opposite directions
    """
    dir2 = dir[1:]
    dir1 = dir[:-1]
    return np.arccos((dir1*dir2).sum(axis=1)/(np.sqrt((dir1**2).sum(axis=1)*(dir2**2).sum(axis=1))))

tolerance = 70
min_angle = np.pi*0.22

filename = os.path.expanduser('~/tmp/bla.data')
points = np.genfromtxt(filename).T
print(len(points))
x, y = points.T

# Use the Ramer-Douglas-Peucker algorithm to simplify the path
# http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm
# Python implementation: https://github.com/sebleier/RDP/
simplified = np.array(rdp.rdp(points.tolist(), tolerance))

print(len(simplified))
sx, sy = simplified.T

# compute the direction vectors on the simplified curve
directions = np.diff(simplified, axis=0)
theta = angle(directions)

# Select the index of the points with the greatest theta
# Large theta is associated with greatest change in direction.
idx = np.where(theta>min_angle)[0]+1

I want to implement the above code on a pandas.DataFrame with trajectory data.

Below is the sample df . sx , sy values belonging to the same subid are considered to be one trajectory, say row(0-3) are having the same subid as 2, and id as 11 is considered to be the points of on trajectory. Rows (4-6) is one trajectory and so one. Therefore, whenever the subid or id changes, separate trajectory data is found.

  id      subid     simplified_points     sx       sy
0 11      2         (3,4)                 3        4
1 11      2         (5,6)                 5        6
2 11      2         (7,8)                 7        8
3 11      2         (9,9)                 9        9
4 11      3         (10,12)               10       12
5 11      3         (12,14)               12       14
6 11      3         (13,15)               13       15
7 12      9         (18,20)               18       20
8 12      9         (22,24)               22       24
9 12      9         (25,27)               25       27

The above data frame has been obtained after already applying the rdp algorithm. The simplified_points further unzipped into two columns sx and sy are the result of rdp algo.

The problem lies in getting the directions for each of these trajectories and then subsequently getting theta and idx . Since the above code has been implemented only for one trajectory and that too on 2d array, I am unable to implement it for above pandas data frame.

Please suggest me a way to implement the above code for each trajectory data in a df.

Answer 1

You can you use pandas.DataFrame.groupby.apply() to work on each (id, subid) , with something like:

Code:

def theta(group):
    dx = pd.Series(group.sx.diff(), name='dx')
    dy = pd.Series(group.sy.diff(), name='dy')
    theta = pd.Series(np.arctan2(dy, dx), name='theta')
    return pd.concat([dx, dy, theta], axis=1)

df2 = df.groupby(['id', 'subid']).apply(theta)

Test Code:

df = pd.read_fwf(StringIO(u"""
    id      subid     simplified_points     sx       sy
    11      2         (3,4)                 3        4
    11      2         (5,6)                 5        6
    11      2         (7,8)                 7        8
    11      2         (9,9)                 9        9
    11      3         (10,12)               10       12
    11      3         (12,14)               12       14
    11      3         (13,15)               13       15
    12      9         (18,20)               18       20
    12      9         (22,24)               22       24
    12      9         (25,27)               25       27"""),
                 header=1)

df2 = df.groupby(['id', 'subid']).apply(theta)
df = pd.concat([df, pd.DataFrame(df2.values, columns=df2.columns)], axis=1)
print(df)

Results:

   id  subid simplified_points  sx  sy   dx   dy     theta
0  11      2             (3,4)   3   4  NaN  NaN       NaN
1  11      2             (5,6)   5   6  2.0  2.0  0.785398
2  11      2             (7,8)   7   8  2.0  2.0  0.785398
3  11      2             (9,9)   9   9  2.0  1.0  0.463648
4  11      3           (10,12)  10  12  NaN  NaN       NaN
5  11      3           (12,14)  12  14  2.0  2.0  0.785398
6  11      3           (13,15)  13  15  1.0  1.0  0.785398
7  12      9           (18,20)  18  20  NaN  NaN       NaN
8  12      9           (22,24)  22  24  4.0  4.0  0.785398
9  12      9           (25,27)  25  27  3.0  3.0  0.785398

How to apply a defined function on pandas data frame

Question

1 answers

solution1
2 ACCPTED 2017-05-07 18:34:13

How to apply a defined function on pandas data frame

Question

1 answers

solution1 2 ACCPTED 2017-05-07 18:34:13

solution1
2 ACCPTED 2017-05-07 18:34:13