在满足列条件后，Pandas会截断DataFrame

Question

So I have the following DataFrame df: 所以我有以下DataFrame df：

在此输入图像描述

The frame contains two groups of data that are sorted within that group. 该框架包含两组在该组中排序的数据。

Group 1 is from index 359 to 365 inclusive 第1组来自359至365的索引

Group 2 is from index 366 to 371 inclusive 第2组来自指数366至371（含）

I want to separate them into the two groups. 我想将它们分成两组。 There may be more than two groups. 可能有两个以上的小组。 The logic I am applying is whenever the next STEPS_ID is less than the current STEPS_ID, this marks the end of the group. 我正在应用的逻辑是每当下一个STEPS_ID小于当前STEPS_ID时，这标志着该组的结束。

I am easily able to get this pointer by df.STEPS_ID <= df.STEPS_ID.shift(-1) 我很容易通过df.STEPS_ID <= df.STEPS_ID.shift（-1）得到这个指针

Is there an elegant pandas way to achieve this easily possibly using vectorized operations rather than for loop? 是否有一种优雅的熊猫方式可以轻松实现这一点，可能使用矢量化操作而不是循环？

This seems to be a common enough problem that I am sure there must be a well-defined algorithm to solve these kinds of problems. 这似乎是一个常见的问题，我相信必须有一个明确定义的算法来解决这些问题。 I would also appreciate if you guys could guide me in reading up on the theoretical basis for such algorithms. 如果你们能指导我阅读这些算法的理论基础，我也将不胜感激。

Answer 1

There is more than one way to "separate things into groups". “将事物分成小组”的方法不止一种。 One way would be to make a list of groups. 一种方法是制作一个组列表。 But that is not the ideal way when dealing with a Pandas DataFrame. 但这不是处理Pandas DataFrame时的理想方式。 Once you have a list, you are forced to loop over the list in a Python loop. 一旦有了列表，就不得不在Python循环中遍历列表。 Those are comparatively slow compared to native Pandas operations. 与本土熊猫作业相比，这些相对较慢。

Assuming you have enough memory, a better way would be to add an column or index to the DataFrame: 假设您有足够的内存，更好的方法是向DataFrame添加列或索引：

import pandas as pd
df = pd.DataFrame({'STEPS_ID':range(1107,1113)*2})
df['GROUP'] = (df['STEPS_ID'] < df['STEPS_ID'].shift(1)).astype('int').cumsum()
# df.set_index('GROUP', inplace=True, append=True)
print(df)

yields 产量

    STEPS_ID  GROUP
0       1107      0
1       1108      0
2       1109      0
3       1110      0
4       1111      0
5       1112      0
6       1107      1
7       1108      1
8       1109      1
9       1110      1
10      1111      1
11      1112      1

Now you can do aggregation/transformation operations on each group by calling 现在，您可以通过调用对每个组执行聚合/转换操作

df.groupby('GROUP')....

在满足列条件后，Pandas会截断DataFrame

问题描述

1 个解决方案

解决方案1
4 已采纳 2013-09-05 12:36:15

在满足列条件后，Pandas会截断DataFrame

问题描述

1 个解决方案

解决方案1 4 已采纳 2013-09-05 12:36:15

解决方案1
4 已采纳 2013-09-05 12:36:15