简体   繁体   中英

Vectorized Operations on two Pandas DataFrame to create a new DataFrame

I have orders.csv as a dataframe called orders_df :

           Symbol Order  Shares
Date                           
2011-01-10   AAPL   BUY    100
2011-01-13   AAPL   SELL   200
2011-01-13    IBM   BUY    100
2011-01-26   GOOG   SELL   200

I end up sorting the data frame with orders_df = orders_df.sort_index() .

Then I create a symbols like so:

symbols = np.append(orders_df.loc[:, 'Symbol'].unique(), 'SPY')

Here comes my second DataFrame df_prices .

df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0

which prints out:

            AAPL IBM  GOOG XOM  SPY   CASH
Date                                                   
2011-01-10  150  100  50   400  100   1.0
2011-01-13  250  200  500  100  100   1.0
2011-01-13  250  200  500  100  100   1.0
2011-01-26  100  150  100  300  50    1.0

Now, I initialize a third data frame:'

df_trades = pd.DataFrame(0, df_prices.index, columns=list(df_prices))

I need to fill this data frame with the correct values using the two previous date frames. If I BUY AAPL , I want to multiply Shares from orders_df with the prices of AAPL times -1 . If it were SELL I wouldn't multiply by -1 . I put that value in the correct CASH column. For the other columns, I simply copy over the Shares of each stock on days they traded.

            AAPL IBM  GOOG XOM  SPY   CASH
Date                                                   
2011-01-10  100  0    0   0  0      -15000
2011-01-13  -200 0    0   0  0      50000
2011-01-13  0   100   0   0  0      -20000
2011-01-26  0     0  -200  0  0      20000

How do I achieve df_trades using vectorized operations?

UPDATE

What if I did:

df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0

which prints out

          AAPL     IBM    GOOG    XOM     SPY  CASH
2011-01-10  340.99  143.41  614.21  72.02  123.19   1.0
2011-01-11  340.18  143.06  616.01  72.56  123.63   1.0
2011-01-12  342.95  144.82  616.87  73.41  124.74   1.0
2011-01-13  344.20  144.55  616.69  73.54  124.54   1.0
2011-01-14  346.99  145.70  624.18  74.62  125.44   1.0
2011-01-18  339.19  146.33  639.63  75.45  125.65   1.0
2011-01-19  337.39  151.22  631.75  75.00  124.42   1.0

How would I produce the df_trades then?

The example values aren't valid anymore fyi.

Vectorized Solution

j = np.array([df_trades.columns.get_loc(c) for c in orders_df.Symbol])
i = np.arange(len(df_trades))
o = np.where(orders_df.Order.values == 'BUY', -1, 1)
v = orders_df.Shares.values * o
t = df_trades.values
t[i, j] = v

df_trades.loc[:, 'CASH'] = \
    df_trades.drop('CASH', 1, errors='ignore').mul(prices_df).sum(1)
df_trades

            AAPL  IBM  GOOG  XOM  SPY     CASH
Date                                          
2011-01-10  -100    0     0    0    0 -15000.0
2011-01-13   200    0     0    0    0  50000.0
2011-01-13     0 -100     0    0    0 -30000.0
2011-01-26     0    0   200    0    0  20000.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM