繁体   English   中英

Pandas - 比较行中的列 ID 并有条件地删除

[英]Pandas - compare column ID in rows and delete conditionally

在一个示例数据框中,如:

Qid     Sid     L1  L2
id01    id02    74  72
id01    id03    74  68
id02    id01    72  74
id02    id03    72  68

我想删除互惠命中,所以输出应该是:

Qid     Sid     L1  L2
id01    id02    74  72
id01    id03    74  68
id02    id03    72  68

在我的真实数据集中,我有数千行,以上只是为了解释这个想法。

这是另一个想法:

import pandas as pd
import numpy as np
data = {'Qid':['id01','id01','id02','id02'],'Sid':['id02','id02','id01','id03'],'L1':[74,74,72,72],'L2':[72,68,74,68]}
df = pd.DataFrame(data)
df[['L1','L2']] = df[['L1','L2']].astype(str) #Turn the values into strings so you can create sortable list over it.
df['aux'] = df[['Qid','Sid','L1','L2']].values.tolist() #create a list of the 4 columns
df['aux'] = df['aux'].apply(sorted).astype(str) #sort the list and treat it as a full string.
df = df.drop_duplicates(subset='aux').drop(columns='aux') #drop the rows where the list is duplicate, that is, there is the same combination of Qid, Sid, L1 and L2.
print(df)

输出:

    Qid   Sid  L1  L2
0  id01  id02  74  72
1  id01  id02  74  68
3  id02  id03  72  68

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM