I have orders.csv
as a dataframe called orders_df
:
Symbol Order Shares
Date
2011-01-10 AAPL BUY 100
2011-01-13 AAPL SELL 200
2011-01-13 IBM BUY 100
2011-01-26 GOOG SELL 200
I end up sorting the data frame with orders_df = orders_df.sort_index()
.
Then I create a symbols
like so:
symbols = np.append(orders_df.loc[:, 'Symbol'].unique(), 'SPY')
Here comes my second DataFrame df_prices
.
df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0
which prints out:
AAPL IBM GOOG XOM SPY CASH
Date
2011-01-10 150 100 50 400 100 1.0
2011-01-13 250 200 500 100 100 1.0
2011-01-13 250 200 500 100 100 1.0
2011-01-26 100 150 100 300 50 1.0
Now, I initialize a third data frame:'
df_trades = pd.DataFrame(0, df_prices.index, columns=list(df_prices))
I need to fill this data frame with the correct values using the two previous date frames. If I BUY
AAPL
, I want to multiply Shares
from orders_df
with the prices of AAPL
times -1
. If it were SELL
I wouldn't multiply by -1
. I put that value in the correct CASH
column. For the other columns, I simply copy over the Shares
of each stock on days they traded.
AAPL IBM GOOG XOM SPY CASH
Date
2011-01-10 100 0 0 0 0 -15000
2011-01-13 -200 0 0 0 0 50000
2011-01-13 0 100 0 0 0 -20000
2011-01-26 0 0 -200 0 0 20000
How do I achieve df_trades
using vectorized operations?
UPDATE
What if I did:
df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0
which prints out
AAPL IBM GOOG XOM SPY CASH
2011-01-10 340.99 143.41 614.21 72.02 123.19 1.0
2011-01-11 340.18 143.06 616.01 72.56 123.63 1.0
2011-01-12 342.95 144.82 616.87 73.41 124.74 1.0
2011-01-13 344.20 144.55 616.69 73.54 124.54 1.0
2011-01-14 346.99 145.70 624.18 74.62 125.44 1.0
2011-01-18 339.19 146.33 639.63 75.45 125.65 1.0
2011-01-19 337.39 151.22 631.75 75.00 124.42 1.0
How would I produce the df_trades
then?
The example values aren't valid anymore fyi.
Vectorized Solution
j = np.array([df_trades.columns.get_loc(c) for c in orders_df.Symbol])
i = np.arange(len(df_trades))
o = np.where(orders_df.Order.values == 'BUY', -1, 1)
v = orders_df.Shares.values * o
t = df_trades.values
t[i, j] = v
df_trades.loc[:, 'CASH'] = \
df_trades.drop('CASH', 1, errors='ignore').mul(prices_df).sum(1)
df_trades
AAPL IBM GOOG XOM SPY CASH
Date
2011-01-10 -100 0 0 0 0 -15000.0
2011-01-13 200 0 0 0 0 50000.0
2011-01-13 0 -100 0 0 0 -30000.0
2011-01-26 0 0 200 0 0 20000.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.