python：检查数据框列：每组是否有多个值？

Question

以下代码：

import numpy as np
import pandas as pd

data=[['A', 1,2 ,5, 'blue'],
        ['A', 5,5,6, 'blue'],
        ['A', 4,6,7, 'blue']
        ,['B', 6,5,4,'yellow'],
        ['B',9,9,3, 'blue'],
        ['B', 7,9,1,'yellow']
        ,['B', 2,3,1,'yellow'],
        ['B', 5,1,2,'yellow'],
        ['C',2,10,9,'green']
        ,['C', 8,2,8,'green'],
        ['C', 5,4,3,'green'],
        ['C', 8,5 ,3,'green']]
df = pd.DataFrame(data, columns=['x','y','z','xy', 'color'])

groups = df.groupby('x')['color'].apply(list)
print(groups)

产生以下输出：

x
A                        [blue, blue, blue]
B    [yellow, blue, yellow, yellow, yellow]
C              [green, green, green, green]
Name: color, dtype: object

我现在想检查每个“x”值是否有多个类别。 例如，A 只有一个类别，而 B 有两个类别。 我不确定是否有办法做到这一点。

Answer 1

将DataFrameGroupBy.nunique用于每个组的唯一值，然后过滤Series更大的index值，例如1 ：

s = df.groupby('x')['color'].nunique()

x = s.index[s > 1].tolist()

您的代码应该通过添加唯一值的过滤器长度来更改：

groups = df.groupby('x')['color'].apply(list)

out = groups[groups.apply(lambda x: len(set(x))) > 1]

编辑：要查看匹配的值，可以使用set s 和过滤器长度：

groups = df.groupby('x')['color'].apply(set)
print (groups)
x
A            {blue}
B    {yellow, blue}
C           {green}
Name: color, dtype: object

out = groups[groups.str.len() > 1]
print (out)
x
B    {yellow, blue}
Name: color, dtype: object

或者非常相似，首先转换为集合，然后转换为列表：

groups = df.groupby('x')['color'].apply(lambda x: list(set(x)))
print (groups)
x
A            [blue]
B    [yellow, blue]
C           [green]
Name: color, dtype: object

out = groups[groups.str.len() > 1]
print (out)
x
B    [yellow, blue]
Name: color, dtype: object

python：检查数据框列：每组是否有多个值？

问题描述

1 个解决方案

解决方案1
2 2021-06-24 12:07:13

python：检查数据框列：每组是否有多个值？

问题描述

1 个解决方案

解决方案1 2 2021-06-24 12:07:13

解决方案1
2 2021-06-24 12:07:13