比较Pandas Dataframe Rows和Dropped具有重叠日期的行

Question

I have a dataframe filled with trades taken from a trading strategy. 我有一个数据框，里面填充了交易策略中的交易。 The logic in the trading strategy needs to be updated to ensure that trade isn't taken if the strategy is already in a trade - but that's a different problem. 需要更新交易策略中的逻辑，以确保如果策略已经在交易中，则不会进行交易 - 但这是一个不同的问题。 The trade data for many previous trades is read into a dataframe from a csv file. 许多先前交易的交易数据从csv文件读入数据帧。

Here's my problem for the data I have: I need to do a row-by-row comparison of the dataframe to determine if Entrydate of rowX is less than ExitDate rowX-1. 这是我的数据问题：我需要对数据帧进行逐行比较，以确定rowX的Entrydate是否小于ExitDate rowX-1。

A sample of my data: 我的数据样本：

Row 1:
EntryDate  ExitDate
2012-07-25 2012-07-27 

Row 2:
EntryDate  ExitDate
2012-07-26 2012-07-29

Row 2 needs to be deleted because it is a trade that should not have occurred. 需要删除第2行，因为它是不应该发生的交易。

I'm having trouble identifying which rows are duplicates and then dropping them. 我无法确定哪些行是重复的，然后丢弃它们。 I tried the approach in answer 3 of this question with some luck but it isn't ideal because I have to manually iterate through the dataframe and read each row's data. 我在这个问题的答案3中尝试了这个方法但运气不错，但这并不理想，因为我必须手动迭代数据帧并读取每一行的数据。 My current approach is below and is ugly as can be. 我目前的方法是在下面，并且可能是丑陋的。 I check the dates, and then add them to a new dataframe. 我检查日期，然后将它们添加到新的数据帧。 Additionally, this approach gives me multiple duplicates in the final dataframe. 此外，这种方法在最终的数据帧中给了我多个重复。

for i in range(0,len(df)+1):
    if i+1 == len(df): break #to keep from going past last row
    ExitDate = df['ExitDate'].irow(i)
    EntryNextTrade = df['EntryDate'].irow(i+1)

    if EntryNextTrade>ExitDate: 
        line={'EntryDate':EntryDate,'ExitDate':ExitDate}
        df_trades=df_trades.append(line,ignore_index=True)

Any thoughts or ideas on how to more efficiently accomplish this? 关于如何更有效地实现这一目标的任何想法或想法？

You can click here to see a sampling of my data if you want to try to reproduce my actual dataframe. 如果您想尝试重现我的实际数据帧，可以单击此处查看我的数据样本。

Answer 1

You should use some kind of boolean mask to do this kind of operation. 您应该使用某种布尔掩码来执行此类操作。

One way is to create a dummy column for the next trade: 一种方法是为下一笔交易创建一个虚拟列：

df['EntryNextTrade'] = df['EntryDate'].shift()

Use this to create the mask: 使用它来创建蒙版：

msk = df['EntryNextTrade'] > df'[ExitDate']

And use loc to look at the subDataFrame where msk is True, and only the specified columns: 并使用loc查看msk为True的subDataFrame，并且仅查看指定的列：

df.loc[msk, ['EntryDate', 'ExitDate']]

比较Pandas Dataframe Rows和Dropped具有重叠日期的行

问题描述

1 个解决方案

解决方案1
11 已采纳 2013-10-16 17:25:08

比较Pandas Dataframe Rows和Dropped具有重叠日期的行

问题描述

1 个解决方案

解决方案1 11 已采纳 2013-10-16 17:25:08

解决方案1
11 已采纳 2013-10-16 17:25:08