[英]Group one column of dataframe by variable index
I have a dataframe which consists of PartialRoutes (which result together in full routes) and a treatment variable and I am trying to reduce the dataframe to the full routes by grouping these together and keeping the treatment variable. 我有一个由PartialRoutes(它们一起导致完整的路由)和一个处理变量组成的数据框,并且我试图通过将它们组合在一起并保留处理变量来将数据框缩小为完整路由。
To make this more clear, the df looks like 为了更清楚一点,df看起来像
PartialRoute Treatment
0 1
1 0
0 0
0 0
1 0
2 0
3 0
0 0
1 1
2 0
where every 0 in 'Partial Route' starts a new group, which means I always want to group all values until a new route starts/ a new 0 in index. 其中“部分路线”中的每个0都会开始一个新的组,这意味着我一直希望对所有值进行分组,直到新路线开始/索引中的新0为止。 So in this example there exists 4 groups
因此,在此示例中,存在4个组
PartialRoute Treatment
0 1
1 0
-----------------
0 0
-----------------
0 0
1 0
2 0
3 0
-----------------
0 0
1 1
2 0
-----------------
and the result should look like 结果应该看起来像
Route Treatment
0 1
1 0
2 0
3 1
Is there any solution to solve this elegant? 有什么解决方案可以解决这种问题吗?
Create groups by comparing by Series.eq
with cumulative sum by Series.cumsum
and then aggregate per groups, eg by sum
or max
: 通过比较创建组
Series.eq
与累积和Series.cumsum
,然后汇总每个组,例如,通过sum
或max
:
df1 = df.groupby(df['PartialRoute'].eq(0).cumsum())['Treatment'].sum().reset_index()
print (df1)
PartialRoute Treatment
0 1 1
1 2 0
2 3 0
3 4 1
Detail : 详细说明 :
print (df['PartialRoute'].eq(0).cumsum())
0 1
1 1
2 2
3 3
4 3
5 3
6 3
7 4
8 4
9 4
Name: PartialRoute, dtype: int32
If first value of DataFrame
is not 0
get different groups - starting by 0
: 如果
DataFrame
第一个值不为0
获取不同的组-从0
开始:
print (df)
PartialRoute Treatment
0 1 1
1 1 0
2 0 0
3 0 0
4 1 0
5 2 0
6 3 0
7 0 0
8 1 1
9 2 0
print (df['PartialRoute'].eq(0).cumsum())
0 0
1 0
2 1
3 2
4 2
5 2
6 2
7 3
8 3
9 3
Name: PartialRoute, dtype: int32
df1 = df.groupby(df['PartialRoute'].eq(0).cumsum())['Treatment'].sum().reset_index()
print (df1)
PartialRoute Treatment
0 0 1
1 1 0
2 2 0
3 3 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.