简体   繁体   English

在数据帧上执行嵌套3循环迭代的最有效方法

[英]Most efficient way to do a nested 3 loop iteration over a dataframe

Hi I have a complicated looping problem in that I have to loop through a dataframe of trading data (one trade per row). 嗨,我有一个复杂的循环问题,因为我必须遍历交易数据的数据框(每行一次交易)。 The trade data has trades against external counterparties and also internal "pending" counterparties and I need to: 贸易数据既可以与外部交易对手进行交易,也可以与内部“待处理”交易对手进行交易,我需要:

(a) find three related trades and (a)找到三个相关行业

(b) change the execution time on the first of three trades to match that of the third trade. (b)更改三笔交易中第一笔的执行时间以匹配第三笔交易的执行时间。

The reason this is three trades is that each trade has a execution time and an entry time (entry time is when the trade flows down to further systems and therefore is equal to or later than execution time). 这是三笔交易的原因是,每笔交易都有一个执行时间和一个进入时间(进入时间是指该交易向下流入其他系统的时间,因此等于或晚于执行时间)。 This must sound overly complicated but that is the way the system works so is a given. 这听起来可能过于复杂,但这就是系统工作的方式。 There is more detail in the code sample below. 下面的代码示例中有更多详细信息。 This works but slowly (hundreds of thousands of trades in the file). 这有效但是很慢(文件中有数十万笔交易)。 My python is rudimentary so I am assuming that there must be a much more efficient way of doing this possibly with .apply or something else? 我的python很基本,所以我假设必须有一种更有效的方法来执行此操作,可能是.apply或其他方法? Anyone have any suggestions? 有人有什么建议吗?

After comments below I have cleaned this up (apologies) and created a minimal working example (should have done that originally) 在下面的评论之后,我已经清理了(抱歉)并创建了一个最小的工作示例(本来应该做到的)

import pandas as pd
import numpy as np
# TradeId - unique trade id
# ExecutionTime - time trade was executed
# EntryTime - time trade entered processing system (equal to or after Execution time)
# Counterparty - counterparty name including external counterparties and internal pending
# TraderName - eg Bob Smith 
# CcyPair - eg GBPUSD
# BaseTrade - notional of the trade in base currency eg 100 GBP
allTradesArrays = {
'TradeId':[101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],
'ExecutionDateTime':['06/26/2018 12:49:23','06/26/2018 13:20:12','06/26/2018 13:20:12','06/26/2018 13:20:32','06/26/2018 13:22:19','06/26/2018 13:25:13','06/26/2018 13:26:18','06/26/2018 15:50:42','06/26/2018 15:51:12','06/26/2018 15:51:12','06/26/2018 15:54:10','06/26/2018 16:17:09','06/26/2018 18:54:06','06/26/2018 18:54:12','06/26/2018 18:54:12','06/26/2018 18:54:15','06/26/2018 19:42:05','06/26/2018 19:58:25','06/26/2018 20:13:19','06/26/2018 20:13:19'],
'EntryDateTime':['06/26/2018 12:49:23','06/26/2018 13:25:13','06/26/2018 13:25:13','06/26/2018 13:20:33','06/26/2018 13:22:19','06/26/2018 13:25:13','06/26/2018 13:26:18','06/26/2018 15:50:42','06/26/2018 15:52:01','06/26/2018 15:54:10','06/26/2018 15:54:10','06/26/2018 16:17:11','06/26/2018 18:54:07','06/26/2018 18:54:30','06/26/2018 19:58:25','06/26/2018 18:54:16','06/26/2018 19:42:05','06/26/2018 19:58:25','06/26/2018 20:13:19','06/26/2018 20:13:19'],
'Counterparty':['cpty1','PENDING','cpty2','cpty12','cpty3','PENDING','cpty6','cpty2','PENDING','cpty8','PENDING','cpty9','cpty1','PENDING','cpty8','cpty3','cpty5','PENDING','cpty6','cpty2'],
'CcyPair':['GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD'],
'BaseTrade':[5,-6.94037287,-6.94037287,-11.63,-0.77222412,6.94037287,21.88,-5.2,10,10,-10,0.3931098,2.5,-670,-670,2.029,20,670,7.37097,11.056455]
}
allTrades = pd.DataFrame(allTradesArrays)
print()
print('allTrades.info()')
print(allTrades.info())
print()
print('allTrades.head()')
print(allTrades.head())
# Create a dataframe that is a subset of the trades dataframe that 
# contains only trades against the pending counterparty - note the
# pending trades are still in the allTrades df as I need to keep
# allTrades complete for further processing once the time stamp changes 
# are made by this looping
pendingTrades = allTrades[allTrades['Counterparty'] == 'PENDING']
print()
print('pendingTrades.info()')
print(pendingTrades.info())
print()
print('pendingTrades.head()')
print(pendingTrades.head())
# iterate over each trade in pendingTrades as explained below
for pendingTradeIndex1, pendingTrade1 in pendingTrades.iterrows():
    for allTradeIndex, allTrade in allTrades.iterrows():
        if (
            # if we find a trade in allTrades that is: 
            # not the same pendingTrade1 trade that is also in the allTrades 
            pendingTrade1['TradeId'] != allTrade['TradeId']
            # has the same CcyPair
            and pendingTrade1['CcyPair'] == allTrade['CcyPair']
            # has the same notional and sign
            and pendingTrade1['BaseTrade'] == allTrade['BaseTrade'] 
            # has matching execution datetimes
            and pendingTrade1['ExecutionDateTime'] == allTrade['ExecutionDateTime']
            # then we have found the first two of three trades
        ):
            # To find the third trade apply similar logic
            for pendingTradeIndex2, pendingTrade2 in pendingTrades.iterrows():
                if (
                    # If we find the second trade in allTrades that is:
                    # not the same pendingTrade2 trade that is also in the allTrades 
                    pendingTrade2['TradeId'] != allTrade['TradeId']
                    # is the same CcyPair as the devon trade
                    and pendingTrade2['CcyPair'] == allTrade['CcyPair']
                    # has the same notional but opposite sign as the devon trade
                    and pendingTrade2['BaseTrade'] == -1*allTrade['BaseTrade'] 
                    # has matching entry datetimes as the devon trade
                    and pendingTrade2['EntryDateTime'] == allTrade['EntryDateTime']
                    # does not have matching execution datetimes as the devon trade
                    and pendingTrade2['ExecutionDateTime'] != allTrade['ExecutionDateTime']
                    # then we should have found the third of three trades
                    # note this third trade is always a pending counterparty trade
                ):
                    print('pendingTrade1 ' + str(pendingTrade1['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'BaseTrade'].values[0]))
                    print('allTrade      ' + str(allTrade['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'BaseTrade'].values[0]))
                    print('pendingTrade2 ' + str(pendingTrade2['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'BaseTrade'].values[0]))
                    print('Changing ' + str(pendingTrade1['TradeId']) + ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' to ' + str(pendingTrade2['TradeId']) + ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]))
                    allTrades.loc[allTrades['TradeId'] == pendingTrade1['TradeId'],'ExecutionDateTime'] = \
                    allTrades.loc[allTrades['TradeId'] == pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]
                    print()

Give itertuples() a try. 尝试一下itertuples() It is supposed to be much quicker than iterrows() 它应该比iterrows()快得多

Stack Overflow: Does iterrows have performance issues 堆栈溢出:迭代是否存在性能问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM