[英]How To Select Identical rows from pandas dataframe based on certain columns
I'm new to pandas and I'm having problem with row selections from dataframe. 我是pandas的新手,我遇到了来自数据帧的行选择问题。
Following is my DataFrame : 以下是我的DataFrame:
Index Column1 Column2 Column3 Column4
0 1234 500 NEWYORK NY
1 5678 700 AUSTIN TX
2 1234 300 NEWYORK NY
3 8910 235 RICHMOND FL
I want to select rows that are having same value in column1,column 3 and column4(identical rows in terms of these 3 columns). 我想选择在column1,column 3和column4中具有相同值的行(这3列中的相同行)。 So the output dataframe will contain rows with index 0 and 2.
因此输出数据帧将包含索引为0和2的行。
Can any one help me with a step-by-step procedure for this custom selection. 任何人都可以帮助我完成此自定义选择的分步过程。
Use df.duplicated
as a mapper to index into df
: 使用
df.duplicated
作为映射器来索引到df
:
c = ['Column1', 'Column3', 'Column4']
df = df[df[c].duplicated(keep=False)]
df
Index Column1 Column2 Column3 Column4
0 0 1234 500 NEWYORK NY
2 2 1234 300 NEWYORK NY
keep=False
will mark all duplicate rows for filtering. keep=False
将标记所有重复行以进行过滤。
Earler I was using following approach : 厄勒我使用以下方法:
d = df.T.to_dict()
dup=[]
for i in d.keys():
for j in d.keys():
if i!=j:
if d[i]['column1']==agg_d[j]['column1'] and d[i]['column3']==d[j]['column3'] and d[i]['column3']==d[j]['column3']:
set(dup.append(k[i]['column1'])
dup_rows = df[df.loc['column1'].isin(dup)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.