将 DataFrame 拆分为仅包含给定常量值的组

Question

I have a DataFrame that I want to split into multiple groups.我有一个 DataFrame 我想分成多个组。 Each group will be a sequence of rows where the column difference is equal to 1. If not, skip it and find the next row with difference equal to 1 and start a new group.每个组将是列difference等于 1 的行序列。如果不是，则跳过它并找到difference等于 1 的下一行并开始一个新组。

For example this:例如这个：

    id  difference
0   001 1
1   001 1
2   001 1
3   001 1
4   001 1
5   001 1
6   001 2
7   001 2
8   001 1
9   001 1
10  001 1
11  001 1
12  001 4
13  001 1
14  001 1
15  001 1
16  001 1
17  001 1
18  001 1
19  001 1

will be 3 dfs first: from 0 to 5 (including 5), second: from 8 to 11, and third: from 13 to 19将是 3 dfs 第一个：从 0 到 5（包括 5），第二个：从 8 到 11，第三个：从 13 到 19

Right now I do it this way, and I am new to pandas.现在我这样做，我是 pandas 的新手。 Is there any other efficient way of doing it?还有其他有效的方法吗？

grouped = g.df((g['difference'] != g['difference'].shift()).cumsum())
for group_id, group in grouped:
    if (group['difference'].iloc[0] < 1.1) & (group['difference'].iloc[0] > 0.9) and len(
            group.index) > 1:
        #do stuff...

Answer 1

Given your splitting condition , use cumsum to create pseudo-groups for groupby .给定您的拆分condition ，使用cumsum为groupby创建伪组。 Then use loc to ignore rows that violate the condition and extract the groups in a dict comprehension:然后使用loc忽略违反condition的行并在字典理解中提取组：

condition = df.difference != 1
dfs = {key: data for key, data in df.loc[~condition].groupby(condition.cumsum())}

Note that if you want to include id as a splitting condition, just add it to the groupby and unpack accordingly:请注意，如果您想包含id作为拆分条件，只需将其添加到groupby并相应地解包：

dfs = {key: data for (_, key), data in df.loc[~condition].groupby(['id', condition.cumsum()])}
#                    ^^^^^^^^                                      ^^^^

Output: Output：

{0:
      id  difference
 0   001           1
 1   001           1
 2   001           1
 3   001           1
 4   001           1
 5   001           1,

 2:
      id  difference
 8   001           1
 9   001           1
 10  001           1
 11  001           1,

 3:
      id  difference
 13  001           1
 14  001           1
 15  001           1
 16  001           1
 17  001           1
 18  001           1
 19  001           1}

将 DataFrame 拆分为仅包含给定常量值的组

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-11-28 12:39:00

将 DataFrame 拆分为仅包含给定常量值的组

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-11-28 12:39:00

解决方案1
4 已采纳 2021-11-28 12:39:00