Python - Pandas 过滤和分组

Question

I need most similar column file, I have data:我需要最相似的列文件，我有数据：

Input:输入：

I need cluster-1 to be equal to cluster-2 in the maximum count, a file that will not be specified not to be included in the cluster我需要 cluster-1 等于 cluster-2 的最大计数，一个不会被指定为不包含在集群中的文件

Output: Output：

Answer 1

Compare first Series.mode per groups by original column, filter and if necessary add not filtered rows with assign bin to cluster-2 :按原始列比较每组的第一个Series.mode ，过滤并在必要时添加未过滤的行，并将分配bin分配给cluster-2 ：

print (df)
  file  cluster-1  cluster-2
0    A          1          2
1    D          1          2
2    G          2          4
3    B          3          1
4    E          3          2
5    J          3          1

m = (df.groupby('cluster-1')['cluster-2']
      .transform(lambda x: x.mode().iat[0])
      .eq(df['cluster-2']))
df = (df[m].append(df[~m].assign(**{'cluster-1':'bin'}), ignore_index=True)
          .rename(columns={'cluster-1':'cluster'})
          .drop('cluster-2', axis=1))
print (df)
  file cluster
0    A       1
1    D       1
2    G       2
3    B       3
4    J       3
5    E     bin

Python - Pandas 过滤和分组

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-12-07 08:17:55

Python - Pandas 过滤和分组

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-12-07 08:17:55

解决方案1
3 已采纳 2021-12-07 08:17:55