简体   繁体   English

使用条件和 groupby 操作 Pandas DataFrame

[英]Manipulating Pandas DataFrame with conditions and groupby

DataFrame数据框

I have the above DataFrame with hundreds of Instruments and different Dates.我有上面的 DataFrame 有数百种仪器和不同的日期。 For each Instrument in every Date,对于每个日期中的每个乐器,

I wish to look for 2 rows of RecordType=='TRADE' (highlighted in yellow).我希望查找 2 行RecordType=='TRADE' (以黄色突出显示)。

The first Trade type is right after the RecordType=='Control' , and第一个交易类型RecordType=='Control'RecordType=='Control' ,并且

the second Trade type is 30 minutes after the first Trade, that is, the latest Trade within 30 minutes after the first trade.第二种交易类型是第一笔交易后30分钟,即第一笔交易后30分钟内的最新交易。

In my example, 30 minutes after the first trade (9:59:47AM) should be 10:29:47AM, and if I convert the timestamp to Milliseconds (since midnight), it is 35987025 + (30mins * 60 * 1000) = 37787025. Therefore, the last Trade before 37787025 is at 37417668 Milliseconds, which is highlighted in yellow.在我的例子中,第一笔交易后 30 分钟(9:59:47AM)应该是 10:29:47AM,如果我将时间戳转换为毫秒(从午夜开始),它是 35987025 + (30mins * 60 * 1000) = 37787025。因此,37787025 之前的最后一笔交易在 37417668 毫秒,以黄色突出显示。 Those Trades highlighted in red are NOT what I want.那些以红色突出显示的交易不是我想要的。

May I know what is the best way to code this?我可以知道什么是最好的编码方式吗? I know I have to groupby(['Instrument', 'Date']) for the analysis.我知道我必须groupby(['Instrument', 'Date'])进行分析。 Thank you.谢谢你。

It's probably useful for you to know that you can use apply together with groupby .知道可以将applygroupby一起使用对您来说可能很有用。 This is not a tested solution, but a rough guide how to get there:这不是经过测试的解决方案,而是如何到达那里的粗略指南:

def handle_single_group(df):
    trades = df[df.RecordType=='TRADE']
    first_trade = trades.iloc[0]

    latest_time_ok = first_trade['Time'] + timedelta(30)
    last_trade = trades[trades.Time <= latest_time_ok].iloc[-1]
    return pd.concat([first_trade, last_trade], axis=0)

df.groupby(['Instrument', 'Date']).apply(handle_single_group)

This assumes the trades are ordered by Time.这假设交易是按时间排序的。 You can also look into using argmax if that's not the case.如果不是这种情况,您也可以考虑使用argmax

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM