简体   繁体   English

Python - Pandas 过滤和分组

[英]Python - Pandas filter and group by

I need most similar column file, I have data:我需要最相似的列文件,我有数据:

Input:输入:

输入

I need cluster-1 to be equal to cluster-2 in the maximum count, a file that will not be specified not to be included in the cluster我需要 cluster-1 等于 cluster-2 的最大计数,一个不会被指定为不包含在集群中的文件

Output: Output:

输出

Compare first Series.mode per groups by original column, filter and if necessary add not filtered rows with assign bin to cluster-2 :按原始列比较每组的第一个Series.mode ,过滤并在必要时添加未过滤的行,并将分配bin分配给cluster-2

print (df)
  file  cluster-1  cluster-2
0    A          1          2
1    D          1          2
2    G          2          4
3    B          3          1
4    E          3          2
5    J          3          1

m = (df.groupby('cluster-1')['cluster-2']
      .transform(lambda x: x.mode().iat[0])
      .eq(df['cluster-2']))
df = (df[m].append(df[~m].assign(**{'cluster-1':'bin'}), ignore_index=True)
          .rename(columns={'cluster-1':'cluster'})
          .drop('cluster-2', axis=1))
print (df)
  file cluster
0    A       1
1    D       1
2    G       2
3    B       3
4    J       3
5    E     bin

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM