对两个Pandas DataFrame进行矢量化操作以创建一个新的DataFrame

Question

I have orders.csv as a dataframe called orders_df : 我将orders.csv作为名为orders_df的数据orders_df ：

           Symbol Order  Shares
Date                           
2011-01-10   AAPL   BUY    100
2011-01-13   AAPL   SELL   200
2011-01-13    IBM   BUY    100
2011-01-26   GOOG   SELL   200

I end up sorting the data frame with orders_df = orders_df.sort_index() . 我最终使用orders_df = orders_df.sort_index()对数据框进行排序。

Then I create a symbols like so: 然后，我创建一个像这样的symbols ：

symbols = np.append(orders_df.loc[:, 'Symbol'].unique(), 'SPY')

Here comes my second DataFrame df_prices . 这是我的第二个DataFrame df_prices 。

df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0

which prints out: 输出：

            AAPL IBM  GOOG XOM  SPY   CASH
Date                                                   
2011-01-10  150  100  50   400  100   1.0
2011-01-13  250  200  500  100  100   1.0
2011-01-13  250  200  500  100  100   1.0
2011-01-26  100  150  100  300  50    1.0

Now, I initialize a third data frame:' 现在，我初始化第三个数据帧：'

df_trades = pd.DataFrame(0, df_prices.index, columns=list(df_prices))

I need to fill this data frame with the correct values using the two previous date frames. 我需要使用前两个日期框架用正确的值填充此数据框架。 If I BUY AAPL , I want to multiply Shares from orders_df with the prices of AAPL times -1 . 如果我BUY AAPL ，我想将来自orders_df Shares乘以AAPL的价格乘以-1 。 If it were SELL I wouldn't multiply by -1 . 如果是SELL我不会乘以-1 。 I put that value in the correct CASH column. 我将该值放在正确的CASH列中。 For the other columns, I simply copy over the Shares of each stock on days they traded. 对于其他列，我只复制交易日的每只股票的Shares 。

            AAPL IBM  GOOG XOM  SPY   CASH
Date                                                   
2011-01-10  100  0    0   0  0      -15000
2011-01-13  -200 0    0   0  0      50000
2011-01-13  0   100   0   0  0      -20000
2011-01-26  0     0  -200  0  0      20000

How do I achieve df_trades using vectorized operations? 如何使用矢量化操作实现df_trades ？

UPDATE UPDATE

What if I did: 如果我这样做了：

df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0

which prints out 打印出来

          AAPL     IBM    GOOG    XOM     SPY  CASH
2011-01-10  340.99  143.41  614.21  72.02  123.19   1.0
2011-01-11  340.18  143.06  616.01  72.56  123.63   1.0
2011-01-12  342.95  144.82  616.87  73.41  124.74   1.0
2011-01-13  344.20  144.55  616.69  73.54  124.54   1.0
2011-01-14  346.99  145.70  624.18  74.62  125.44   1.0
2011-01-18  339.19  146.33  639.63  75.45  125.65   1.0
2011-01-19  337.39  151.22  631.75  75.00  124.42   1.0

How would I produce the df_trades then? 那我将如何产生df_trades ？

The example values aren't valid anymore fyi. 示例值不再有效。

Answer 1

Vectorized Solution 向量化解决方案

j = np.array([df_trades.columns.get_loc(c) for c in orders_df.Symbol])
i = np.arange(len(df_trades))
o = np.where(orders_df.Order.values == 'BUY', -1, 1)
v = orders_df.Shares.values * o
t = df_trades.values
t[i, j] = v

df_trades.loc[:, 'CASH'] = \
    df_trades.drop('CASH', 1, errors='ignore').mul(prices_df).sum(1)
df_trades

            AAPL  IBM  GOOG  XOM  SPY     CASH
Date                                          
2011-01-10  -100    0     0    0    0 -15000.0
2011-01-13   200    0     0    0    0  50000.0
2011-01-13     0 -100     0    0    0 -30000.0
2011-01-26     0    0   200    0    0  20000.0

对两个Pandas DataFrame进行矢量化操作以创建一个新的DataFrame

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-10-18 14:24:36

对两个Pandas DataFrame进行矢量化操作以创建一个新的DataFrame

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-10-18 14:24:36

解决方案1
2 已采纳 2017-10-18 14:24:36