简体   繁体   中英

Pandas dataframe : Applying function to row value and value from the previous row

I am trying to apply the following function to a Pandas dataframe:

def eukarney(lat1, lon1, alt1, lat2, lon2, alt2):
    p1 = (lat1, lon1)
    p2 = (lat2, lon2)
    karney = distance.distance(p1, p2).m
    return np.sqrt(karney**2 + (alt2 - alt1)**2)

This works if I use discrete values such as for instance:

distance = eukarney(49.907611, 5.890404, 339.15734, 49.907683, 5.890373, 339.18224)

However, if I try to apply the function to a Pandas dataframe:

df['distances'] = eukarney(df['latitude'], df['longitude'], df['altitude'], df['latitude'].shift(), df['longitude'].shift(), df['altitude'].shift())

Which means taking values from a row and the previous one.

I receive the following error message:

Traceback (most recent call last): File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 78, in df['distances'] = eukarney(df.loc[:,'latitude':], df.loc[:,'longitude':], df.loc[:,'altitude':], df.loc[:,'latitude':].shift(), df.loc[:,'longitude':].shift(), df.loc[:,'altitude':].shift()) File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 75, in eukarney karney = distance.distance(p1, p2).m File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 522, in init super(). init (*args, **kwargs) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 276, in init kilometers += self.measure(a, b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 538, in measure a, b = Point(a), Point(b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 175, in new return cls.from_sequence(seq) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 472, in from_sequence return cls(*args) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 188, in new _normalize_coordinates(latitude, longitude, altitude) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 57, in _normalize_coordinates latitude = float(latitude or 0.0) File "/home/mirix/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 1534, in nonzero raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a. all().

Intriguingly, the same syntax works for other functions not using the geopy library.

Any ideas?

SOLUTION

There seems to be an intrinsic limitation with GeoPy's distance function which seems to only accept scalars.

The following workaround is based upon @SeaBen answer bellow:

df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])

df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)

You can use .apply() on each row, as follows:

Here, .apply() helps you pass the scalar values row by row to the custom function. Thus, enabling you to reuse your custom function which was designed to work on scalar values. Otherwise, you may need to modify your custom function to support vectorized array values of Pandas.

To cater for the .shift() entries, one workaround will be to define new columns for them first so that we can pass them to the .apply() function.

# Take previous entry by shift and `fillna` with original value for first row entry 
# (for in case the custom function cannot handle `NaN` entry on first row after shift)
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])

df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM