[英]Pandas - compare column ID in rows and delete conditionally
在一个示例数据框中,如:
Qid Sid L1 L2
id01 id02 74 72
id01 id03 74 68
id02 id01 72 74
id02 id03 72 68
我想删除互惠命中,所以输出应该是:
Qid Sid L1 L2
id01 id02 74 72
id01 id03 74 68
id02 id03 72 68
在我的真实数据集中,我有数千行,以上只是为了解释这个想法。
这是另一个想法:
import pandas as pd
import numpy as np
data = {'Qid':['id01','id01','id02','id02'],'Sid':['id02','id02','id01','id03'],'L1':[74,74,72,72],'L2':[72,68,74,68]}
df = pd.DataFrame(data)
df[['L1','L2']] = df[['L1','L2']].astype(str) #Turn the values into strings so you can create sortable list over it.
df['aux'] = df[['Qid','Sid','L1','L2']].values.tolist() #create a list of the 4 columns
df['aux'] = df['aux'].apply(sorted).astype(str) #sort the list and treat it as a full string.
df = df.drop_duplicates(subset='aux').drop(columns='aux') #drop the rows where the list is duplicate, that is, there is the same combination of Qid, Sid, L1 and L2.
print(df)
输出:
Qid Sid L1 L2
0 id01 id02 74 72
1 id01 id02 74 68
3 id02 id03 72 68
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.