简体   繁体   English

按变量索引将一列数据框分组

[英]Group one column of dataframe by variable index

I have a dataframe which consists of PartialRoutes (which result together in full routes) and a treatment variable and I am trying to reduce the dataframe to the full routes by grouping these together and keeping the treatment variable. 我有一个由PartialRoutes(它们一起导致完整的路由)和一个处理变量组成的数据框,并且我试图通过将它们组合在一起并保留处理变量来将数据框缩小为完整路由。

To make this more clear, the df looks like 为了更清楚一点,df看起来像

PartialRoute  Treatment
0             1
1             0
0             0
0             0
1             0
2             0
3             0
0             0
1             1
2             0

where every 0 in 'Partial Route' starts a new group, which means I always want to group all values until a new route starts/ a new 0 in index. 其中“部分路线”中的每个0都会开始一个新的组,这意味着我一直希望对所有值进行分组,直到新路线开始/索引中的新0为止。 So in this example there exists 4 groups 因此,在此示例中,存在4个组

PartialRoute  Treatment
0             1
1             0
-----------------
0             0
-----------------
0             0
1             0
2             0
3             0
-----------------
0             0
1             1
2             0
-----------------

and the result should look like 结果应该看起来像

Route Treatment
0     1
1     0
2     0
3     1

Is there any solution to solve this elegant? 有什么解决方案可以解决这种问题吗?

Create groups by comparing by Series.eq with cumulative sum by Series.cumsum and then aggregate per groups, eg by sum or max : 通过比较创建组Series.eq与累积和Series.cumsum ,然后汇总每个组,例如,通过summax

df1 = df.groupby(df['PartialRoute'].eq(0).cumsum())['Treatment'].sum().reset_index()
print (df1)
   PartialRoute  Treatment
0             1          1
1             2          0
2             3          0
3             4          1

Detail : 详细说明

print (df['PartialRoute'].eq(0).cumsum())
0    1
1    1
2    2
3    3
4    3
5    3
6    3
7    4
8    4
9    4
Name: PartialRoute, dtype: int32

If first value of DataFrame is not 0 get different groups - starting by 0 : 如果DataFrame第一个值不为0获取不同的组-从0开始:

print (df)
   PartialRoute  Treatment
0             1          1
1             1          0
2             0          0
3             0          0
4             1          0
5             2          0
6             3          0
7             0          0
8             1          1
9             2          0

print (df['PartialRoute'].eq(0).cumsum())
0    0
1    0
2    1
3    2
4    2
5    2
6    2
7    3
8    3
9    3
Name: PartialRoute, dtype: int32

df1 = df.groupby(df['PartialRoute'].eq(0).cumsum())['Treatment'].sum().reset_index()
print (df1)
   PartialRoute  Treatment
0             0          1
1             1          0
2             2          0
3             3          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM