I am early beginner.
I have the following dataframe (df1) with transaction dates as index, columns = account #, quantity of transaction, and ticker.
Account Quantity Symbol/CUSIP
Trade Date
2020-03-31 1 NaN 990156937
2020-03-31 2 0.020 IIAXX
2020-03-24 1 NaN 990156937
2020-03-20 1 650.000 DOC
2020-03-23 1 NaN 990156937
... ... ... ...
2017-11-24 2 55.000 QQQ
2018-01-01 1 10.000 AMZN
2018-01-01 1 250.000 HOS
2017-09-13 1 229.051 VFINX
2017-09-21 1 1.118 VFINX
[266 rows x 3 columns]
I would like to populate a 2nd dataframe (df2) which shows the total quantity on every day between the min & max of the index of (df1), grouped by account and by ticker. Below is am empty dataframe of what I am looking to do:
df2 = Total Quantity by ticker and account #, on every single day between min and max of df1
990156937 IIAXX DOC AER NaN ATVI H VCSH GOOGL VOO VG \
2020-03-31 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-03-30 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-03-29 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Thus, for each day between the min of max of the transaction dates in df1 - I need to calculate the cumulative sum of all transaction of that date or earlier, grouped by account and ticker.
How could I accomplish this? Thanks in advance.
I suggest the following:
import pandas as pd
import numpy as np
# first I reproduce a similar dataframe
df = pd.DataFrame({"date": pd.date_range("2017-1-1", periods=3).repeat(6),
"account": [1, 1, 3, 1, 2, 3, 2,2, 1, 1, 2, 3, 1, 2, 3, 2,2,1],
"quantity": [123, 0.020, np.NaN, 650, 345, np.NaN, 345, 456, 121, 243, 445, 453, 987, np.NaN, 76, 143, 87, 19],
"symbol": ['990156937', '990156937', '990156937', 'DOC', 'AER', 'ATVI', 'AER', 'ATVI', 'IIAXX',
'990156937', '990156937', '990156937', 'DOC', 'AER', 'ATVI', 'AER', 'ATVI', 'IIAXX']})
This is what it looks like:
date account quantity symbol
0 2017-01-01 1 123.00 990156937
1 2017-01-01 1 0.02 990156937
2 2017-01-01 3 NaN 990156937
3 2017-01-01 1 650.00 DOC
4 2017-01-01 2 345.00 AER
You want to go to a wide format using unstack
:
# You groupby date, account and symbol and sum the quantities
df = df.groupby(["date", "account", "symbol"]).agg({"quantity":"sum"})
df_wide = df.unstack()
# Finally groupby account to get the cumulative sum per account across dates
# Fill na with 0 to get cumulative sum right
df_wide = df_wide.fillna(0)
df_wide = df_wide.groupby(df_wide.index.get_level_values("account")).cumsum()
You get the result:
quantity
990156937 AER ATVI DOC IIAXX
date account
2017-01-01 1 123.02 0.0 0.0 650.0 0.0
2 0.00 345.0 0.0 0.0 0.0
3 0.00 0.0 0.0 0.0 0.0
2017-01-02 1 366.02 0.0 0.0 650.0 121.0
2 445.00 690.0 456.0 0.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.