简体   繁体   中英

Calculate aggregate value of column row by row

My apologies for the vague title, it's complicated to translate what I want in writing terms.

I'm trying to build a filled line chart with the date on x axis and total transaction over time on the y axis

My data

The object is a pandas dataframe.

date         | symbol | type | qty |   total
----------------------------------------------
2020-09-10    ABC       Buy     5      10     
2020-10-18    ABC       Buy     2      20      
2020-09-19    ABC       Sell    3      15    
2020-11-05    XYZ       Buy     10     8       
2020-12-03    XYZ       Buy     10     9      
2020-12-05    ABC       Buy     2      5     

What I whant

date         | symbol | type | qty |   total  | aggregate_total
------------------------------------------------------------
2020-09-10    ABC       Buy     5      10          10
2020-10-18    ABC       Buy     2      20         10+20 = 30
2020-09-19    ABC       Sell    3      15        10+20-15 = 15
2020-11-05    XYZ       Buy     10     8         8
2020-12-03    XYZ       Buy     10     9         8+9 = 17
2020-12-05    ABC       Buy     2      5          10+20-15+5 = 20

Where I am now

I'm working with 2 nested for loops: one for iterating over the symbols, one for iterating each row. I store the temporary results in lists. I'm still unsure how I will add the results to the final dataframe. I could reorder the dataframe by symbol and date, then append each temp lists together and finally assign that temp list to a new column.

The code below is just the inner loop over the rows.

af = df.loc[df['symbol'] == 'ABC']


 for i in (range(0,af.shape[0])):
       # print(af.iloc[0:i,[2,4]])

        # if type is a buy, we add the last operation to the aggregate
        if af.iloc[i,2] == "BUY":
            temp_agg_total.append(temp_agg_total[i] + af.iloc[i,4])
            temp_agg_qty.append(temp_agg_qty[i] + af.iloc[i, 3])

        else:
            temp_agg_total.append(temp_agg_total[i] - af.iloc[i,4])
            temp_agg_qty.append(temp_agg_qty[i] - af.iloc[i, 3])

        # Remove first element of list (0)
        temp_agg_total.pop(0)
        temp_agg_qty.pop(0)

    af = af.assign(agg_total = temp_agg_total,
              agg_qty = temp_agg_qty)

My question

Is there a better way to do this in pandas or numpy ? It feels really heavy for something relatively simple.

The presence of the Buy/Sell type of operation complicates things.

Regards

# negate qty of Sells
df.loc[df['type']=='Sell', 'total'] *=-1
# cumulative sum of the qty based on symbol
df['aggregate_total'] = df.groupby('symbol')['total'].cumsum()

Is this which you're looking for?

df['Agg'] = 1
df.loc[df['type'] == 'Sell', 'Agg'] = -1
df['Agg'] = df['Agg']*df['total']
df['Agg'].cumsum()
df["Type_num"] = df["type"].map({"Buy":1,"Sell":-1})
df["Num"] = df.Type_num*df.total
df.groupby(["symbol"],as_index=False)["Num"].cumsum()
pd.concat([df,df.groupby(["symbol"],as_index=False)["Num"].cumsum()],axis=1)

        date    symbol  type    qty   total Type_num    Num CumNum
0   2020-09-10   ABC     Buy     5     10      1        10   10
1   2020-10-18   ABC     Buy     2     20      1        20   30
2   2020-09-19   ABC     Sell    3     15     -1       -15   15
3   2020-11-05   XYZ     Buy     10    8       1        8    8
4   2020-12-03   XYZ     Buy     10    9       1        9    17
5   2020-12-05   ABC     Buy     2     5       1        5    20

The most important thing here is the cumulative sum. The regrouping is used to make sure that the cumulative sum is just performed on each kind of different symbol . The renaming and dropping of columns should be easy for you.

Trick is that I made {sell; buy} {sell; buy} into {1,-1}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM