[英]Vectorized Operations on two Pandas DataFrame to create a new DataFrame
I have orders.csv
as a dataframe called orders_df
: 我将orders.csv
作为名为orders_df
的数据orders_df
:
Symbol Order Shares
Date
2011-01-10 AAPL BUY 100
2011-01-13 AAPL SELL 200
2011-01-13 IBM BUY 100
2011-01-26 GOOG SELL 200
I end up sorting the data frame with orders_df = orders_df.sort_index()
. 我最终使用orders_df = orders_df.sort_index()
对数据框进行排序。
Then I create a symbols
like so: 然后,我创建一个像这样的symbols
:
symbols = np.append(orders_df.loc[:, 'Symbol'].unique(), 'SPY')
Here comes my second DataFrame df_prices
. 这是我的第二个DataFrame df_prices
。
df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0
which prints out: 输出:
AAPL IBM GOOG XOM SPY CASH
Date
2011-01-10 150 100 50 400 100 1.0
2011-01-13 250 200 500 100 100 1.0
2011-01-13 250 200 500 100 100 1.0
2011-01-26 100 150 100 300 50 1.0
Now, I initialize a third data frame:' 现在,我初始化第三个数据帧:'
df_trades = pd.DataFrame(0, df_prices.index, columns=list(df_prices))
I need to fill this data frame with the correct values using the two previous date frames. 我需要使用前两个日期框架用正确的值填充此数据框架。 If I BUY
AAPL
, I want to multiply Shares
from orders_df
with the prices of AAPL
times -1
. 如果我BUY
AAPL
,我想将来自orders_df
Shares
乘以AAPL
的价格乘以-1
。 If it were SELL
I wouldn't multiply by -1
. 如果是SELL
我不会乘以-1
。 I put that value in the correct CASH
column. 我将该值放在正确的CASH
列中。 For the other columns, I simply copy over the Shares
of each stock on days they traded. 对于其他列,我只复制交易日的每只股票的Shares
。
AAPL IBM GOOG XOM SPY CASH
Date
2011-01-10 100 0 0 0 0 -15000
2011-01-13 -200 0 0 0 0 50000
2011-01-13 0 100 0 0 0 -20000
2011-01-26 0 0 -200 0 0 20000
How do I achieve df_trades
using vectorized operations? 如何使用矢量化操作实现df_trades
?
UPDATE UPDATE
What if I did: 如果我这样做了:
df_prices = get_data(symbols, orders_df.index, addSPY=False)
df_prices.loc[:, 'CASH] = 1.0
which prints out 打印出来
AAPL IBM GOOG XOM SPY CASH
2011-01-10 340.99 143.41 614.21 72.02 123.19 1.0
2011-01-11 340.18 143.06 616.01 72.56 123.63 1.0
2011-01-12 342.95 144.82 616.87 73.41 124.74 1.0
2011-01-13 344.20 144.55 616.69 73.54 124.54 1.0
2011-01-14 346.99 145.70 624.18 74.62 125.44 1.0
2011-01-18 339.19 146.33 639.63 75.45 125.65 1.0
2011-01-19 337.39 151.22 631.75 75.00 124.42 1.0
How would I produce the df_trades
then? 那我将如何产生df_trades
?
The example values aren't valid anymore fyi. 示例值不再有效。
Vectorized Solution 向量化解决方案
j = np.array([df_trades.columns.get_loc(c) for c in orders_df.Symbol])
i = np.arange(len(df_trades))
o = np.where(orders_df.Order.values == 'BUY', -1, 1)
v = orders_df.Shares.values * o
t = df_trades.values
t[i, j] = v
df_trades.loc[:, 'CASH'] = \
df_trades.drop('CASH', 1, errors='ignore').mul(prices_df).sum(1)
df_trades
AAPL IBM GOOG XOM SPY CASH
Date
2011-01-10 -100 0 0 0 0 -15000.0
2011-01-13 200 0 0 0 0 50000.0
2011-01-13 0 -100 0 0 0 -30000.0
2011-01-26 0 0 200 0 0 20000.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.