简体   繁体   中英

How to filter dataframe on two columns and output cumulative sum

I am early beginner.

I have the following dataframe (df1) with transaction dates as index, columns = account #, quantity of transaction, and ticker.

             Account  Quantity Symbol/CUSIP
Trade Date                                 
2020-03-31         1       NaN    990156937
2020-03-31         2     0.020        IIAXX
2020-03-24         1       NaN    990156937
2020-03-20         1   650.000          DOC
2020-03-23         1       NaN    990156937
...              ...       ...          ...
2017-11-24         2    55.000          QQQ
2018-01-01         1    10.000         AMZN
2018-01-01         1   250.000          HOS
2017-09-13         1   229.051        VFINX
2017-09-21         1     1.118        VFINX
[266 rows x 3 columns]

I would like to populate a 2nd dataframe (df2) which shows the total quantity on every day between the min & max of the index of (df1), grouped by account and by ticker. Below is am empty dataframe of what I am looking to do:

df2 = Total Quantity by ticker and account #, on every single day between min and max of df1

              990156937 IIAXX  DOC  AER  NaN ATVI    H VCSH GOOGL  VOO   VG  \
2020-03-31 3       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
           2       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
           1       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
2020-03-30 3       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
           2       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
           1       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
2020-03-29 3       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
           2       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN   
           1       NaN   NaN  NaN  NaN  NaN  NaN  NaN  NaN   NaN  NaN  NaN    

Thus, for each day between the min of max of the transaction dates in df1 - I need to calculate the cumulative sum of all transaction of that date or earlier, grouped by account and ticker.

How could I accomplish this? Thanks in advance.

I suggest the following:

import pandas as pd
import numpy as np

# first I reproduce a similar dataframe
df = pd.DataFrame({"date": pd.date_range("2017-1-1", periods=3).repeat(6),
                   "account": [1, 1, 3, 1, 2, 3, 2,2, 1, 1, 2, 3, 1, 2, 3, 2,2,1],
                   "quantity": [123, 0.020, np.NaN, 650, 345, np.NaN, 345, 456, 121, 243, 445, 453, 987, np.NaN, 76, 143, 87, 19],
                   "symbol": ['990156937', '990156937', '990156937', 'DOC', 'AER', 'ATVI', 'AER', 'ATVI', 'IIAXX',
                              '990156937', '990156937', '990156937', 'DOC', 'AER', 'ATVI', 'AER', 'ATVI', 'IIAXX']})

This is what it looks like:

       date  account  quantity     symbol
0 2017-01-01        1    123.00  990156937
1 2017-01-01        1      0.02  990156937
2 2017-01-01        3       NaN  990156937
3 2017-01-01        1    650.00        DOC
4 2017-01-01        2    345.00        AER

You want to go to a wide format using unstack :

# You groupby date, account and symbol and sum the quantities
df = df.groupby(["date", "account", "symbol"]).agg({"quantity":"sum"})
df_wide = df.unstack()
# Finally groupby account to get the cumulative sum per account across dates
# Fill na with 0 to get cumulative sum right
df_wide = df_wide.fillna(0)
df_wide = df_wide.groupby(df_wide.index.get_level_values("account")).cumsum()

You get the result:

                  quantity                            
                   990156937    AER   ATVI    DOC  IIAXX
date       account                                      
2017-01-01 1          123.02    0.0    0.0  650.0    0.0
           2            0.00  345.0    0.0    0.0    0.0
           3            0.00    0.0    0.0    0.0    0.0
2017-01-02 1          366.02    0.0    0.0  650.0  121.0
           2          445.00  690.0  456.0    0.0    0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM