如何使用 Pandas 根据值和连续时间段将带时间戳的 CSV 数据拆分为多个 CSV

Question

I am trying to analyse a ships AIS data.我正在尝试分析船舶 AIS 数据。 I have a CSV with ~20,000 rows, with columns for lat / long / speed / time stamp.我有一个 CSV 约 20,000 行，列用于纬度/经度/速度/时间戳。

I have loaded the data in a pandas data frame, in a Jupyter notebook.我已将数据加载到 Jupyter 笔记本中的 pandas 数据框中。

What I want to do is split the CSV into smaller CSVs based on the time stamp and the speed, so I want an individual CSV for each period of time the vessel speed was less than say 2 knots, eg if the vessel transited at 10 knots for 6hrs, then slowed down to 1 knot for a period of 3 hrs, sped back up 10 knots, then slowed down again to 1 knot for a period of 4 hrs, I would want to the output to be two CSVs, one for the 3hr period and one for the 4hr period.我想要做的是根据时间戳和速度将 CSV 拆分为更小的 CSV，所以我想要一个单独的 CSV，每个时间段船速小于 2 节，例如，如果船只以 10 节的速度通过6 小时，然后减速到 1 节，持续 3 小时，加速 10 节，然后再次减速到 1 节，持续 4 小时，我希望 output 成为两个 CSV，一个用于3小时一个周期，一个4小时周期。 This is so I can review these periods individually in my mapping software.这样我就可以在我的地图软件中单独查看这些时间段。

I can filter the data easily to show all the periods where it is <1 knot but I can't break it down to output the continuous periods as separate CSVs / data frames.我可以轻松过滤数据以显示小于 1 节的所有周期，但我无法将其分解为 output 连续周期作为单独的 CSV/数据帧。 EDIT编辑

Here is an example of the data这是数据的示例

I've tried to show more clearly what I want to achieve here我试图更清楚地展示我想要在这里实现的目标

Answer 1

Here is something to maybe get you started.这里有一些东西可以让你开始。

First filter out all values that meets the criteria (for example below 2):首先过滤掉所有符合条件的值（例如下面的2）：

df = pd.DataFrame({'speed':[2,1,4,5,4,1,1,1,3,4,5,6], 'time':[4,5,6,7,8,9,10,11,12,13,14,15]})
df_below2 = df[df['speed']<=2].reset_index(drop=True)

Now we need to split the frame if there is too long gap btw values in time.现在，如果时间间隙 btw 值太长，我们需要拆分帧。 For example:例如：

threshold = 2
df_below2['not_continuous']  = df_below2['time'].diff() > threshold

Distinguish between the groups using cums:使用 cum 区分组：

df_below2['group_id'] = df_below2['not_continuous'].cumsum()

From here it should be easy to split the frame based on the group id.从这里应该很容易根据组 ID 拆分帧。

如何使用 Pandas 根据值和连续时间段将带时间戳的 CSV 数据拆分为多个 CSV

问题描述

1 个解决方案

解决方案1
0 2020-04-30 10:34:16

如何使用 Pandas 根据值和连续时间段将带时间戳的 CSV 数据拆分为多个 CSV

问题描述

1 个解决方案

解决方案1 0 2020-04-30 10:34:16

解决方案1
0 2020-04-30 10:34:16