根据重复的列值过滤 pandas dataframe - Python

Question

So, I have a data frame of this type:所以，我有一个这种类型的数据框：

               Name   1   2   3   4   5  
               Alex  10  40  20  11  50
               Alex  10  60  20  11  60
                Sam  30  15  50  15  60
                Sam  30  12  50  15  43 
               John  50  18 100   8  32
               John  50  15 100   8  21

I am trying to keep only the columns that have repeated values for all unique row values.我试图只保留对所有唯一行值具有重复值的列。 For example, in this case, I want to keep columns 1,3,4 because they have repeated values for each 'duplicate' row.例如，在这种情况下，我想保留第 1、3、4 列，因为它们对每个“重复”行都有重复值。 But I want to keep the column only if the values are repeated for EACH pair of names - so, the whole column should consist of pairs of same values.但是我只想在每对名称的值重复时保留该列 - 因此，整个列应该由成对的相同值组成。 Any ideas of how to do that?关于如何做到这一点的任何想法？

Answer 1

Using a simple list inside agg :在agg中使用一个简单的list ：

cond = df.groupby('Name').agg(list).applymap(lambda x: len(x) != len(set(x)))

dupe_cols = cond.columns[cond.all()]

Answer 2

this is the easiest way I can think of这是我能想到的最简单的方法

from collections import Counter

import pandas as pd

data = [[   'Name',   1,   2,   3,   4,   5],
[   'Alex',  10,  40,  20,  11,  50],
[   'Alex',  10,  60,  20,  11,  60],
[    'Sam',  30,  15,  50,  15,  60],
[    'Sam',  30,  12,  50,  15,  43],
[   'John',  50,  18, 100,   8,  32],
[   'John',  50,  15, 100,   8,  21]]

df = pd.DataFrame(data)

vals = []
for row in range(0,len(df)):
    tmp = Counter(df.iloc[row])
    if 2 not in tmp.values():
        vals.append(row)
        
ndf = df.iloc[vals]
ndf.drop_duplicates(subset='Name',keep='first')

returns回报


Name    1   2   3   4   5
1   Alex    10  40  20  11  50
4   Sam 30  12  50  15  43
5   John    50  18  100 8   32

根据重复的列值过滤 pandas dataframe - Python

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-11-17 11:46:27

解决方案2
0 2022-11-17 11:39:52

根据重复的列值过滤 pandas dataframe - Python

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-11-17 11:46:27

解决方案2 0 2022-11-17 11:39:52

解决方案1
1 已采纳 2022-11-17 11:46:27

解决方案2
0 2022-11-17 11:39:52