有效地迭代pandas.DataFrame，同时一次访问多个索引行

Question

我已经阅读了有关如何有效迭代pandas.DataFrame的答案和博客条目（ https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6 ），但我仍然还有一个问题。

目前，我的DataFrame代表一个GPS轨迹，其中包含时间，经度和纬度列。 现在，我想计算一个称为距离到下一个点的特征 。 因此，我不仅必须遍历行并在单行上执行操作，而且还必须在一次迭代中访问后续行。

i=0
for index, row in df.iterrows():
    if i < len(df)-1:
        distance = calculate_distance([row['latitude'],row['longitude']],[df.loc[i+1,'latitude'],df.loc[i+1,'longitude']])
        row['distance'] = distance

除了这个问题，在计算速度，应用平滑或其他类似方法时，我还有一个问题。

另一个示例：我想搜索速度== 0 m / s的数据点并从这些点传出，我想将所有后续数据点添加到数组中，直到速度达到10 m / s（以查找从0m / s加速的段）至10m / s）。

您对如何编写像possbile这样高效的东西有什么建议吗？

Answer 1

您可以使用pd.DataFrame.shift将移位后的序列添加到数据pd.DataFrame.shift ，然后通过apply输入到函数中：

def calculate_distance(row):
    # your function goes here, trivial function used for demonstration
    return sum(row[i] for i in df.columns)

df[['next_latitude', 'next_longitude']] = df[['latitude', 'longitude']].shift(-1)
df.loc[df.index[:-1], 'distance'] = df.iloc[:-1].apply(calculate_distance, axis=1)

print(df)

   latitude  longitude  next_latitude  next_longitude  distance
0         1          5            2.0             6.0      14.0
1         2          6            3.0             7.0      18.0
2         3          7            4.0             8.0      22.0
3         4          8            NaN             NaN       NaN

这适用于任意函数calculate_distance ，但是您的算法很可能是矢量化的，在这种情况下，您应该使用按列的Pandas / NumPy方法。

有效地迭代pandas.DataFrame，同时一次访问多个索引行

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-11-26 15:20:31

有效地迭代pandas.DataFrame，同时一次访问多个索引行

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-11-26 15:20:31

解决方案1
2 已采纳 2018-11-26 15:20:31