[英]Split DataFrame into groups that contain only a given constant value
I have a DataFrame that I want to split into multiple groups.我有一个 DataFrame 我想分成多个组。 Each group will be a sequence of rows where the column
difference
is equal to 1. If not, skip it and find the next row with difference
equal to 1 and start a new group.每个组将是列
difference
等于 1 的行序列。如果不是,则跳过它并找到difference
等于 1 的下一行并开始一个新组。
For example this:例如这个:
id difference
0 001 1
1 001 1
2 001 1
3 001 1
4 001 1
5 001 1
6 001 2
7 001 2
8 001 1
9 001 1
10 001 1
11 001 1
12 001 4
13 001 1
14 001 1
15 001 1
16 001 1
17 001 1
18 001 1
19 001 1
will be 3 dfs first: from 0 to 5 (including 5), second: from 8 to 11, and third: from 13 to 19将是 3 dfs 第一个:从 0 到 5(包括 5),第二个:从 8 到 11,第三个:从 13 到 19
Right now I do it this way, and I am new to pandas.现在我这样做,我是 pandas 的新手。 Is there any other efficient way of doing it?
还有其他有效的方法吗?
grouped = g.df((g['difference'] != g['difference'].shift()).cumsum())
for group_id, group in grouped:
if (group['difference'].iloc[0] < 1.1) & (group['difference'].iloc[0] > 0.9) and len(
group.index) > 1:
#do stuff...
Given your splitting condition
, use cumsum
to create pseudo-groups for groupby
.给定您的拆分
condition
,使用cumsum
为groupby
创建伪组。 Then use loc
to ignore rows that violate the condition
and extract the groups in a dict comprehension:然后使用
loc
忽略违反condition
的行并在字典理解中提取组:
condition = df.difference != 1
dfs = {key: data for key, data in df.loc[~condition].groupby(condition.cumsum())}
Note that if you want to include id
as a splitting condition, just add it to the groupby
and unpack accordingly:请注意,如果您想包含
id
作为拆分条件,只需将其添加到groupby
并相应地解包:
dfs = {key: data for (_, key), data in df.loc[~condition].groupby(['id', condition.cumsum()])}
# ^^^^^^^^ ^^^^
Output: Output:
{0:
id difference
0 001 1
1 001 1
2 001 1
3 001 1
4 001 1
5 001 1,
2:
id difference
8 001 1
9 001 1
10 001 1
11 001 1,
3:
id difference
13 001 1
14 001 1
15 001 1
16 001 1
17 001 1
18 001 1
19 001 1}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.