简体   繁体   中英

Advanced Cumulative Grouping in Pandas and Python

I have a dataframe df with the following time series data:

        a        b      sign
1    -0.066   19.514    -1
2     0.334   19.848     1
3     0.991   20.839     1
4      0.42   21.259     1
5       0.0   21.367     0
6     -0.57   20.797    -1
7    -0.294   20.503    -1
8     0.579   21.082     1
9     0.234   21.316     1
0    -0.954   20.362    -1

I'm currently using the following cumulative grouping magic:

df2 = df.groupby((df["sign"] != df['sign'].shift()).cumsum(), as_index=False).agg({'sign': 'first', 'a': ['sum','count']})

Which produces the following output (I've tweaked column headers):

    sign     asum   cnt   
0   -1.0    -0.209   1  
1    1.0     1.970   3  
2    0.0     0.000   1  
3   -1.0    -0.813   2  
4    1.0     0.755   2  
5   -1.0    -1.015   1  

So far so good, this shows runs of positive and negative values nicely.
However, I do not want the 0 sign as a separate group, I want it to be associated with either the 1 group or the -1 group (depending upon my mood).

So what is the best way to accomplish this? I ideally need the output to be like this (assuming the 0 group is now going become part of the 1 group):

    sign      a    cnt   
0   -1.0    -0.209  1   
1    1.0     1.970  4           
2   -1.0    -0.813  2   
3    1.0     0.755  2   
4   -1.0    -1.015  1   

I'm not exactly sure what you're going for, but this seems to achieve what you're trying to do:

# change mood to 'negative' when required...
mood = 'positive'

if mood=='positive':
    df['sign'].replace(0,1,inplace=True)
else:
    df['sign'].replace(0,-1,inplace=True)

Then, proceeding with your exact groupby you had above, you end up with your target output (didn't change the column names as you did):

   sign      a      
  first    sum count
0    -1 -0.066     1
1     1  1.853     4
2    -1 -0.864     2
3     1  0.813     2
4    -1 -0.954     1

Why not just replace all the 0's in the original sign column to either a -1 or a 1? Then the rows that previously were associated with the 0 group will be associated with the -1 or 1 group.

To change them to a one:

df.sign = df.sign.where(df.sign!=0, 1)

Then,

df.groupby((df["sign"] != df['sign'].shift()).cumsum(), as_index=False).agg({'sign': 'first', 'a': ['sum','count']})

will yield:

       a        sign
     sum count first
0 -0.066     1    -1
1  1.853     4     1
2 -0.864     2    -1
3  0.813     2     1
4 -0.954     1    -1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM