简体   繁体   中英

Pandas time since last trade

I have a pandas DataFrame containing symbols and times at which a transaction took place. There's also a third column indicating whether the transaction was a purchase or a sale. For example,

pd.DataFrame([“AAPL”, “AAPL”, “GOOG”, “AAPL”, “GOOG”], [1, 2, 3, 6, 7], [“BUY”, “SELL”, “BUY”, “BUY”, “BUY”])

I would like to add a third column representing the time since the previous transaction on the same symbol regardless of type. In the example above, this column would be

[NaN, 1, NaN, 4, 4]

The first entry of each stock is always NaN because there's no transaction on the same symbol before it. The second entry happens to be 1 because the last time AAPL traded was 1 and 2 - 1 = 1 is the difference between the current time and the last trade time.

I'd also like to have a fourth column indicating whether the last transaction was a BUY or a SELL. Again in this example, we'd have

[NaN, BUY, NaN, SELL, BUY]

as the fourth column. Can someone please help me figure out how to do this?

I think I need to do a merge_asof to match the time to the smallest time before it. But since I want it to be independently computed for each symbol, perhaps I also need a groupby? I'm struggling to get the correct logic. Thanks

Assuming this input and sorted "times":

   col1  col2  col3
0  AAPL     1   BUY
1  AAPL     2  SELL
2  GOOG     3   BUY
3  AAPL     6   BUY
4  GOOG     7   BUY

You can use groupby.shift / diff :

g = df.groupby('col1')
df['col4'] = g['col3'].shift()
df['col5'] = g['col2'].diff()

Output:

   col1  col2  col3  col4  col5
0  AAPL     1   BUY   NaN   NaN
1  AAPL     2  SELL   BUY   1.0
2  GOOG     3   BUY   NaN   NaN
3  AAPL     6   BUY  SELL   4.0
4  GOOG     7   BUY   BUY   4.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM