简体   繁体   中英

Python Faster way of updating dataframe

I am trying to test buying and selling of three different stocks. I created a class that I plan to plug into an AI system and try to find a strategy. This currently works. You can buy [symbol], sell[symbol], and next to just proceed. Some of the functions take too long. I believe here is a faster and more Python-y way of doing this. My background is JavaScript.

I am using a dataframe to store trades. Open trades have no closeTimeStamp. profit is sell_price - buy_price for closed trades or profit is sell_price - current_quote for open trades (that are short) profit is current_quote - buy_price for open trades that are long. After I update the holdings_with_quotes I can just sum the profit column and I get the current value of open and closed trades.

    self.trades = pd.DataFrame(columns=['symbol', 'buy_price', 'sell_price', 'current_quote', 'openTimeStamp', 'closeTimeStamp', 'profit', 'multiplier'])

This function is what is taking so much time.

  def update_holdings_with_quotes(self):
    start = time.time()
    if self.current_contracts > 0:
      quotes = self.quotes
      for symbol in ['/ES', '/NQ', '/YM']:
      # for symbol in self.trades['symbol']:
        current_price = self.quotes.loc[symbol]['lastPriceInDouble']
        multiplier = self.quotes.loc[symbol]['futureMultiplier']
        self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol), 'current_quote'] = current_price
        self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.BUY), 'profit'] =  (current_price - self.trades['buy_price']) * multiplier
        self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.SELL), 'profit'] =  (self.trades['sell_price'] - current_price)  * multiplier

      self.current_value = self.initial_value + self.trades['profit'].sum()
      self.current_gain = self.current_value - self.initial_value
    print("update_holdings_with_quotes time: {}".format(time.time() - start))

Basically I am looping through the three quotes and setting values in my dataframe if the trade has no closeTimeStamp aka the trade is still open. I tried using an array of static symbols, but that didn't speed things up.

I could use something other than a dataframe. I just used it because I thought it would be helpful.

*** I edited the function based on a suggestion to use two dataframes instead of one. One for open trades, and one for closed. That didn't help much.

  def update_holdings_with_quotes(self):
    start = time.time()
    if self.current_contracts > 0:
      quotes = self.quotes
      for symbol in ['/ES', '/NQ', '/YM']:
      # for symbol in self.trades['symbol']:
        current_price = self.quotes.loc[symbol]['lastPriceInDouble']
        multiplier = self.quotes.loc[symbol]['futureMultiplier']
        self.open_trades.loc[(self.open_trades['symbol'] == symbol), 'current_quote'] = current_price
        self.open_trades.loc[(self.open_trades['symbol'] == symbol) & (self.open_trades['action'] == Actions.BUY), 'profit'] =  (current_price - self.open_trades['buy_price']) * multiplier
        self.open_trades.loc[(self.open_trades['symbol'] == symbol) & (self.open_trades['action'] == Actions.SELL), 'profit'] =  (self.open_trades['sell_price'] - current_price)  * multiplier

      self.current_value = self.initial_value + self.open_trades['profit'].sum() + self.closed_trades['profit'].sum()
      self.current_gain = self.current_value - self.initial_value
      # self.logger.info('initial_value={} current_value={} current_contracts={}'.format(self.initial_value, self.current_value, self.current_contracts))
      self.check_status()
    print("update_holdings_with_quotes time: {}".format(time.time() - start))

This is the part where it gets slow:

    self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol), 'current_quote'] = current_price
    self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.BUY), 'profit'] =  (current_price - self.trades['buy_price']) * multiplier
    self.trades.loc[self.trades['closeTimeStamp'].isnull() & (self.trades['symbol'] == symbol) & (self.trades['action'] == Actions.SELL), 'profit'] =  (self.trades['sell_price'] - current_price)  * multiplier

In particular asking multiple times every loop to look through the whole "trades" DataFrame just for indexing with:

self.trades['closeTimeStamp'].isnull()
self.trades['symbol'] == symbol

A stratight forward solution would be to split your DataFrame trades into two closed_trades and open_trades . The latter should be significantly smaller over time and speed up the look up tremendously. As your tracked stocks grow it might be sensible to further split this among symbols through subclassing.

An alternative would be to track your indices of open positions in an array. Each time you add a trade, just add maxIndex+1 to your array. If you close a trade, drop the index from the list. Just make sure to not reindex.

import numpy as np
ix_open = np.array([...])
ix_closed = np.array([...])

# after closing a few trades:
ix_open = np.setdiff1d(ix_open, ix_closed)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM