[英]Filtering pandas dataframe based on repeated column values - Python
So, I have a data frame of this type:所以,我有一个这种类型的数据框:
Name 1 2 3 4 5
Alex 10 40 20 11 50
Alex 10 60 20 11 60
Sam 30 15 50 15 60
Sam 30 12 50 15 43
John 50 18 100 8 32
John 50 15 100 8 21
I am trying to keep only the columns that have repeated values for all unique row values.我试图只保留对所有唯一行值具有重复值的列。 For example, in this case, I want to keep columns 1,3,4 because they have repeated values for each 'duplicate' row.
例如,在这种情况下,我想保留第 1、3、4 列,因为它们对每个“重复”行都有重复值。 But I want to keep the column only if the values are repeated for EACH pair of names - so, the whole column should consist of pairs of same values.
但是我只想在每对名称的值重复时保留该列 - 因此,整个列应该由成对的相同值组成。 Any ideas of how to do that?
关于如何做到这一点的任何想法?
Using a simple list
inside agg
:在
agg
中使用一个简单的list
:
cond = df.groupby('Name').agg(list).applymap(lambda x: len(x) != len(set(x)))
dupe_cols = cond.columns[cond.all()]
this is the easiest way I can think of这是我能想到的最简单的方法
from collections import Counter
import pandas as pd
data = [[ 'Name', 1, 2, 3, 4, 5],
[ 'Alex', 10, 40, 20, 11, 50],
[ 'Alex', 10, 60, 20, 11, 60],
[ 'Sam', 30, 15, 50, 15, 60],
[ 'Sam', 30, 12, 50, 15, 43],
[ 'John', 50, 18, 100, 8, 32],
[ 'John', 50, 15, 100, 8, 21]]
df = pd.DataFrame(data)
vals = []
for row in range(0,len(df)):
tmp = Counter(df.iloc[row])
if 2 not in tmp.values():
vals.append(row)
ndf = df.iloc[vals]
ndf.drop_duplicates(subset='Name',keep='first')
returns回报
Name 1 2 3 4 5
1 Alex 10 40 20 11 50
4 Sam 30 12 50 15 43
5 John 50 18 100 8 32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.