简体   繁体   English

如何使用 Pandas 根据值和连续时间段将带时间戳的 CSV 数据拆分为多个 CSV

[英]How to use Pandas to split timestamped CSV data into multiple CSVs based on values and continuous time periods

I am trying to analyse a ships AIS data.我正在尝试分析船舶 AIS 数据。 I have a CSV with ~20,000 rows, with columns for lat / long / speed / time stamp.我有一个 CSV 约 20,000 行,列用于纬度/经度/速度/时间戳。

I have loaded the data in a pandas data frame, in a Jupyter notebook.我已将数据加载到 Jupyter 笔记本中的 pandas 数据框中。

What I want to do is split the CSV into smaller CSVs based on the time stamp and the speed, so I want an individual CSV for each period of time the vessel speed was less than say 2 knots, eg if the vessel transited at 10 knots for 6hrs, then slowed down to 1 knot for a period of 3 hrs, sped back up 10 knots, then slowed down again to 1 knot for a period of 4 hrs, I would want to the output to be two CSVs, one for the 3hr period and one for the 4hr period.我想要做的是根据时间戳和速度将 CSV 拆分为更小的 CSV,所以我想要一个单独的 CSV,每个时间段船速小于 2 节,例如,如果船只以 10 节的速度通过6 小时,然后减速到 1 节,持续 3 小时,加速 10 节,然后再次减速到 1 节,持续 4 小时,我希望 output 成为两个 CSV,一个用于3小时一个周期,一个4小时周期。 This is so I can review these periods individually in my mapping software.这样我就可以在我的地图软件中单独查看这些时间段。

I can filter the data easily to show all the periods where it is <1 knot but I can't break it down to output the continuous periods as separate CSVs / data frames.我可以轻松过滤数据以显示小于 1 节的所有周期,但我无法将其分解为 output 连续周期作为单独的 CSV/数据帧。 EDIT编辑

Here is an example of the data这是数据的示例

I've tried to show more clearly what I want to achieve here我试图更清楚地展示我想要在这里实现的目标

Here is something to maybe get you started.这里有一些东西可以让你开始。

First filter out all values that meets the criteria (for example below 2):首先过滤掉所有符合条件的值(例如下面的2):

df = pd.DataFrame({'speed':[2,1,4,5,4,1,1,1,3,4,5,6], 'time':[4,5,6,7,8,9,10,11,12,13,14,15]})
df_below2 = df[df['speed']<=2].reset_index(drop=True)

Now we need to split the frame if there is too long gap btw values in time.现在,如果时间间隙 btw 值太长,我们需要拆分帧。 For example:例如:

threshold = 2
df_below2['not_continuous']  = df_below2['time'].diff() > threshold

Distinguish between the groups using cums:使用 cum 区分组:

df_below2['group_id'] = df_below2['not_continuous'].cumsum()

From here it should be easy to split the frame based on the group id.从这里应该很容易根据组 ID 拆分帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM