简体   繁体   English

比较Pandas Dataframe Rows和Dropped具有重叠日期的行

[英]Comparing Pandas Dataframe Rows & Dropping rows with overlapping dates

I have a dataframe filled with trades taken from a trading strategy. 我有一个数据框,里面填充了交易策略中的交易。 The logic in the trading strategy needs to be updated to ensure that trade isn't taken if the strategy is already in a trade - but that's a different problem. 需要更新交易策略中的逻辑,以确保如果策略已经在交易中,则不会进行交易 - 但这是一个不同的问题。 The trade data for many previous trades is read into a dataframe from a csv file. 许多先前交易的交易数据从csv文件读入数据帧。

Here's my problem for the data I have: I need to do a row-by-row comparison of the dataframe to determine if Entrydate of rowX is less than ExitDate rowX-1. 这是我的数据问题:我需要对数据帧进行逐行比较,以确定rowX的Entrydate是否小于ExitDate rowX-1。

A sample of my data: 我的数据样本:

Row 1:
EntryDate  ExitDate
2012-07-25 2012-07-27 

Row 2:
EntryDate  ExitDate
2012-07-26 2012-07-29

Row 2 needs to be deleted because it is a trade that should not have occurred. 需要删除第2行,因为它是不应该发生的交易。

I'm having trouble identifying which rows are duplicates and then dropping them. 我无法确定哪些行是重复的,然后丢弃它们。 I tried the approach in answer 3 of this question with some luck but it isn't ideal because I have to manually iterate through the dataframe and read each row's data. 在这个问题的答案3中尝试了这个方法但运气不错,但这并不理想,因为我必须手动迭代数据帧并读取每一行的数据。 My current approach is below and is ugly as can be. 我目前的方法是在下面,并且可能是丑陋的。 I check the dates, and then add them to a new dataframe. 我检查日期,然后将它们添加到新的数据帧。 Additionally, this approach gives me multiple duplicates in the final dataframe. 此外,这种方法在最终的数据帧中给了我多个重复。

for i in range(0,len(df)+1):
    if i+1 == len(df): break #to keep from going past last row
    ExitDate = df['ExitDate'].irow(i)
    EntryNextTrade = df['EntryDate'].irow(i+1)

    if EntryNextTrade>ExitDate: 
        line={'EntryDate':EntryDate,'ExitDate':ExitDate}
        df_trades=df_trades.append(line,ignore_index=True)

Any thoughts or ideas on how to more efficiently accomplish this? 关于如何更有效地实现这一目标的任何想法或想法?

You can click here to see a sampling of my data if you want to try to reproduce my actual dataframe. 如果您想尝试重现我的实际数据帧,可以单击此处查看我的数据样本。

You should use some kind of boolean mask to do this kind of operation. 您应该使用某种布尔掩码来执行此类操作。

One way is to create a dummy column for the next trade: 一种方法是为下一笔交易创建一个虚拟列:

df['EntryNextTrade'] = df['EntryDate'].shift()

Use this to create the mask: 使用它来创建蒙版:

msk = df['EntryNextTrade'] > df'[ExitDate']

And use loc to look at the subDataFrame where msk is True, and only the specified columns: 并使用loc查看msk为True的subDataFrame,并且仅查看指定的列:

df.loc[msk, ['EntryDate', 'ExitDate']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM