简体   繁体   English

根据重复的列值过滤 pandas dataframe - Python

[英]Filtering pandas dataframe based on repeated column values - Python

So, I have a data frame of this type:所以,我有一个这种类型的数据框:

               Name   1   2   3   4   5  
               Alex  10  40  20  11  50
               Alex  10  60  20  11  60
                Sam  30  15  50  15  60
                Sam  30  12  50  15  43 
               John  50  18 100   8  32
               John  50  15 100   8  21
                    

I am trying to keep only the columns that have repeated values for all unique row values.我试图只保留对所有唯一行值具有重复值的列。 For example, in this case, I want to keep columns 1,3,4 because they have repeated values for each 'duplicate' row.例如,在这种情况下,我想保留第 1、3、4 列,因为它们对每个“重复”行都有重复值。 But I want to keep the column only if the values are repeated for EACH pair of names - so, the whole column should consist of pairs of same values.但是我只想在每对名称的值重复时保留该列 - 因此,整个列应该由成对的相同值组成。 Any ideas of how to do that?关于如何做到这一点的任何想法?

Using a simple list inside agg :agg中使用一个简单的list

cond = df.groupby('Name').agg(list).applymap(lambda x: len(x) != len(set(x)))

dupe_cols = cond.columns[cond.all()]

this is the easiest way I can think of这是我能想到的最简单的方法

from collections import Counter

import pandas as pd

data = [[   'Name',   1,   2,   3,   4,   5],
[   'Alex',  10,  40,  20,  11,  50],
[   'Alex',  10,  60,  20,  11,  60],
[    'Sam',  30,  15,  50,  15,  60],
[    'Sam',  30,  12,  50,  15,  43],
[   'John',  50,  18, 100,   8,  32],
[   'John',  50,  15, 100,   8,  21]]

df = pd.DataFrame(data)

vals = []
for row in range(0,len(df)):
    tmp = Counter(df.iloc[row])
    if 2 not in tmp.values():
        vals.append(row)
        
ndf = df.iloc[vals]
ndf.drop_duplicates(subset='Name',keep='first')

returns回报


Name    1   2   3   4   5
1   Alex    10  40  20  11  50
4   Sam 30  12  50  15  43
5   John    50  18  100 8   32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM