Python 更新数据帧的更快方法

Question

I am trying to test buying and selling of three different stocks.我正在尝试测试三种不同股票的买卖。 I created a class that I plan to plug into an AI system and try to find a strategy.我创建了一个课程，我计划将其插入人工智能系统并尝试找到一种策略。 This currently works.这目前有效。 You can buy [symbol], sell[symbol], and next to just proceed.您可以买入 [symbol]，卖出 [symbol]，然后继续。 Some of the functions take too long.某些功能花费的时间太长。 I believe here is a faster and more Python-y way of doing this.我相信这是一种更快、更 Python-y 的方式来做到这一点。 My background is JavaScript.我的背景是 JavaScript。

I am using a dataframe to store trades.我正在使用数据框来存储交易。 Open trades have no closeTimeStamp.未平仓交易没有 closeTimeStamp。 profit is sell_price - buy_price for closed trades or profit is sell_price - current_quote for open trades (that are short) profit is current_quote - buy_price for open trades that are long.利润是sell_price - 已平仓交易的buy_price 或利润是sell_price - 未平仓交易（即空头）的current_quote 利润是current_quote - 多头未平仓交易的buy_price。 After I update the holdings_with_quotes I can just sum the profit column and I get the current value of open and closed trades.在我更新了 Holdings_with_quotes 之后，我可以对利润列进行求和，并获得未平仓和已平仓交易的当前值。

    self.trades = pd.DataFrame(columns=['symbol', 'buy_price', 'sell_price', 'current_quote', 'openTimeStamp', 'closeTimeStamp', 'profit', 'multiplier'])

This function is what is taking so much time.这个功能需要这么多时间。

  def update_holdings_with_quotes(self):
    start = time.time()
    if self.current_contracts > 0:
      quotes = self.quotes
      for symbol in ['/ES', '/NQ', '/YM']:
      # for symbol in self.trades['symbol']:
        current_price = self.quotes.loc[symbol]['lastPriceInDouble']
        multiplier = self.quotes.loc[symbol]['futureMultiplier']
        self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol), 'current_quote'] = current_price
        self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.BUY), 'profit'] =  (current_price - self.trades['buy_price']) * multiplier
        self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.SELL), 'profit'] =  (self.trades['sell_price'] - current_price)  * multiplier

      self.current_value = self.initial_value + self.trades['profit'].sum()
      self.current_gain = self.current_value - self.initial_value
    print("update_holdings_with_quotes time: {}".format(time.time() - start))

Basically I am looping through the three quotes and setting values in my dataframe if the trade has no closeTimeStamp aka the trade is still open.基本上，如果交易没有 closeTimeStamp 又名交易仍然打开，我将循环遍历我的数据框中的三个报价和设置值。 I tried using an array of static symbols, but that didn't speed things up.我尝试使用一组静态符号，但这并没有加快速度。

I could use something other than a dataframe.我可以使用数据框以外的其他东西。 I just used it because I thought it would be helpful.我只是使用它，因为我认为它会有所帮助。

*** I edited the function based on a suggestion to use two dataframes instead of one. *** 我根据使用两个数据框而不是一个的建议编辑了该函数。 One for open trades, and one for closed.一种用于开放交易，一种用于关闭交易。 That didn't help much.那没有多大帮助。

  def update_holdings_with_quotes(self):
    start = time.time()
    if self.current_contracts > 0:
      quotes = self.quotes
      for symbol in ['/ES', '/NQ', '/YM']:
      # for symbol in self.trades['symbol']:
        current_price = self.quotes.loc[symbol]['lastPriceInDouble']
        multiplier = self.quotes.loc[symbol]['futureMultiplier']
        self.open_trades.loc[(self.open_trades['symbol'] == symbol), 'current_quote'] = current_price
        self.open_trades.loc[(self.open_trades['symbol'] == symbol) & (self.open_trades['action'] == Actions.BUY), 'profit'] =  (current_price - self.open_trades['buy_price']) * multiplier
        self.open_trades.loc[(self.open_trades['symbol'] == symbol) & (self.open_trades['action'] == Actions.SELL), 'profit'] =  (self.open_trades['sell_price'] - current_price)  * multiplier

      self.current_value = self.initial_value + self.open_trades['profit'].sum() + self.closed_trades['profit'].sum()
      self.current_gain = self.current_value - self.initial_value
      # self.logger.info('initial_value={} current_value={} current_contracts={}'.format(self.initial_value, self.current_value, self.current_contracts))
      self.check_status()
    print("update_holdings_with_quotes time: {}".format(time.time() - start))

Answer 1

This is the part where it gets slow:这是它变慢的部分：

    self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol), 'current_quote'] = current_price
    self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.BUY), 'profit'] =  (current_price - self.trades['buy_price']) * multiplier
    self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.SELL), 'profit'] =  (self.trades['sell_price'] - current_price)  * multiplier

In particular asking multiple times every loop to look through the whole "trades" DataFrame just for indexing with:特别是在每个循环中多次询问以查看整个“交易”DataFrame 仅用于索引：

self.trades['closeTimeStamp'].isnull()
self.trades['symbol'] == symbol

A stratight forward solution would be to split your DataFrame trades into two closed_trades and open_trades .一个直接的解决方案是将您的 DataFrame trades分成两个closed_trades和open_trades 。 The latter should be significantly smaller over time and speed up the look up tremendously.随着时间的推移，后者应该明显更小，并极大地加快查找速度。 As your tracked stocks grow it might be sensible to further split this among symbols through subclassing.随着您跟踪的股票的增长，通过子类进一步将其拆分为符号可能是明智的。

An alternative would be to track your indices of open positions in an array.另一种方法是跟踪数组中未平仓头寸的索引。 Each time you add a trade, just add maxIndex+1 to your array.每次添加交易时，只需将maxIndex+1添加到您的数组中。 If you close a trade, drop the index from the list.如果您关闭交易，请从列表中删除该指数。 Just make sure to not reindex.只要确保不要重新索引。

import numpy as np
ix_open = np.array([...])
ix_closed = np.array([...])

# after closing a few trades:
ix_open = np.setdiff1d(ix_open, ix_closed)

Python 更新数据帧的更快方法

问题描述

1 个解决方案

解决方案1
0 2020-03-25 02:02:39

Python 更新数据帧的更快方法

问题描述

1 个解决方案

解决方案1 0 2020-03-25 02:02:39

解决方案1
0 2020-03-25 02:02:39