简体   繁体   中英

Pandas sum of subset rows and re-merge in DF

I have a DF according to below:

      id_var1   id_var2  num_var1   num_var2
      1         1        1          1
      1         2        1          0
      1         3        2          0
      1         4        2          3
      1         5        3          3
      1         6        3          3
      1         7        3          0 
      1         8        4          0
      2         1        1          0
      2         2        2          1
      2         3        5          0
      2         4        2          0
      2         5        1          2  
      2         6        1          2
      2         7        2          0

I want a DF with the following appearance:

      id_var1   id_var2  num_var1   num_var2   row_sum
      1         1        1          1          2      
      1         2        1          0          NaN
      1         3        2          0          Nan
      1         4        2          3          11
      1         5        3          3          Nan
      1         6        3          3          Nan
      1         7        3          0          Nan
      1         8        4          0          Nan
      2         1        1          0          Nan
      2         2        2          1          7
      2         3        5          0          Nan
      2         4        2          0          Nan
      2         5        1          2          4
      2         6        1          2          Nan
      2         7        2          0          Nan

At each first num_var2 which is not 0 I want to sum(num_var1) the same row + as many rows down as num_var2 states.

Example1 : Row 4 has num_var2 = 3 --> sum( num_var1 ) for row 4 + 3 rows down = 11 for id_var1 = 1 and id_var2 = 4

Example2 : Row 12 has num_var2 = 2 --> sum( num_var1 ) for row 12 + 2 rows down = 4 for id_var1 = 2 and id_var2 = 5.

Can someone please help me with this one? Can it be done without a slow row-itteration?

Code for DF below:

df = pd.DataFrame({ 'id_var1' : [1] * 8 + [2] * 7
                    ,'id_var2' : [i for i in range(1,9)] + [i for i in range(1,8)]
                   ,'num_var1' : [1,1,2,2,3,3,3,4] + [1,2,5,2,1,1,2]
               ,'num_var2' : [1, 0,0,3,3,3,0,0] + [0,1,0,0,2,2,0]

Let me know if this works for you.

First create a list of values from num_var1 column. Then get sum of sub list- Created from num_var1 , from the current index to the required number items (taken from column num_var2).

sublst() function is called only when the previous record's num_var2 not matching current record's num_var2 .

import pandas as pd

df = pd.DataFrame({ 'id_var1' : [1] * 8 + [2] * 7
                    ,'id_var2' : [i for i in range(1,9)] + [i for i in range(1,8)]
                   ,'num_var1' : [1,1,2,2,3,3,3,4] + [1,2,5,2,1,1,2]
               ,'num_var2' : [1, 0,0,3,3,3,0,0] + [0,1,0,0,2,2,0]

num_var1  =df['num_var1'].tolist() # values to be used for calcualtion
df['index1'] = df.index

def sublst(row):
    if row['num_var2']>0:
        x= num_var1[row['index1']:row['index1']+row['num_var2']+1]
        return sum(x)

df['sum'] = df[df.num_var2 != df.num_var2.shift()].apply(sublst,axis=1)

print df


       id_var1  id_var2  num_var1  num_var2  index1   sum
0         1        1         1         1       0   2.0
1         1        2         1         0       1   NaN
2         1        3         2         0       2   NaN
3         1        4         2         3       3  11.0
4         1        5         3         3       4   NaN
5         1        6         3         3       5   NaN
6         1        7         3         0       6   NaN
7         1        8         4         0       7   NaN
8         2        1         1         0       8   NaN
9         2        2         2         1       9   7.0
10        2        3         5         0      10   NaN
11        2        4         2         0      11   NaN
12        2        5         1         2      12   4.0
13        2        6         1         2      13   NaN
14        2        7         2         0      14   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM