[英]Use particular column value as key to search in pandas dataframe
我需要獲取具有特定列值的行作為鍵下面是我的pandas df。
>>> data
OrderID TimeStamp ErrorCode Duration ResponseType \
0 3000000 1488948188555841641 NaN IOC NaN
1 3000000 1488948188556444675 0 NaN NEW_ORDER_CONFIRM
2 3000000 1488948188556448153 2 NaN TRADE_CONFIRM
3 3000001 1488948658787676012 NaN IOC NaN
4 3000001 1488948658787811582 1 NaN NEW_ORDER_CONFIRM
5 3000001 1488948658787824862 2 NaN TRADE_CONFIRM
6 3000002 1488949064945887091 NaN IOC NaN
7 3000003 1488949109654115659 NaN IOC NaN
8 3000003 1488949109654294973 1 NaN NEW_ORDER_CONFIRM
9 3000003 1488949109654299930 16388 NaN CANCEL_ORDER_CONFIRM
我需要選擇所有orderID,其中Duration是IOC(相當簡單),用於answer orders = data.loc[data.Duration == 'IOC', 'OrderID'].unique()
,然后獲取所選的那些行OrderID,其中持續時間是NaN。 OrderID將始終為3或僅為單個ORDERID(無法返回輸出或空行,例如OrderID 3000002)
棘手的部分是NEW_ORDER_CONFIRM中的Errorcode是正確的,TRADE_CONFIRM或CANCEL_ORDER_CONFIRM中的錯誤代碼是錯誤的。 我只想在最后一行輸出中找到正確的值。
EXPECTED O/P ROW 1
OrderID TimeStamp ErrorCode Duration ResponseType \
0 3000000 1488948188555841641 0 IOC TRADE_CONFIRM
我嘗試使用grep IOC loglife| cut -d, -f1 to get OrderID then grep each OrderID & NaN
來使用bash grep IOC loglife| cut -d, -f1 to get OrderID then grep each OrderID & NaN
。 但我需要一個更高效的python解決方案
我認為你可以先得到OrderID
列的所有unique
值,其中IOC
是Duration
,然后通過boolean indexing
選擇所有NaN
- 掩碼由isin
和isnull
創建:
#unique can be omit, but then solution a bit slowier in big df
orders = df.loc[df.Duration == 'IOC', 'OrderID'].unique()
df = df[df.OrderID.isin(orders) & df.Duration.isnull()]
print (df)
OrderID TimeStamp ErrorCode Duration ResponseType
1 3000000 1488948188556448153 2.0 NaN TRADE_CONFIRM
3 3000001 1488948658787824862 2.0 NaN TRADE_CONFIRM
6 3000003 1488949109654299930 16388.0 NaN CANCEL_ORDER_CONFIRM
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.