繁体   English   中英

python:检查数据框列:每组是否有多个值?

[英]python: check dataframe columns: is there more than one value for each group?

以下代码:

import numpy as np
import pandas as pd

data=[['A', 1,2 ,5, 'blue'],
        ['A', 5,5,6, 'blue'],
        ['A', 4,6,7, 'blue']
        ,['B', 6,5,4,'yellow'],
        ['B',9,9,3, 'blue'],
        ['B', 7,9,1,'yellow']
        ,['B', 2,3,1,'yellow'],
        ['B', 5,1,2,'yellow'],
        ['C',2,10,9,'green']
        ,['C', 8,2,8,'green'],
        ['C', 5,4,3,'green'],
        ['C', 8,5 ,3,'green']]
df = pd.DataFrame(data, columns=['x','y','z','xy', 'color'])

groups = df.groupby('x')['color'].apply(list)
print(groups)

产生以下输出:

x
A                        [blue, blue, blue]
B    [yellow, blue, yellow, yellow, yellow]
C              [green, green, green, green]
Name: color, dtype: object

我现在想检查每个“x”值是否有多个类别。 例如,A 只有一个类别,而 B 有两个类别。 我不确定是否有办法做到这一点。

DataFrameGroupBy.nunique用于每个组的唯一值,然后过滤Series更大的index值,例如1

s = df.groupby('x')['color'].nunique()

x = s.index[s > 1].tolist()

您的代码应该通过添加唯一值的过滤器长度来更改:

groups = df.groupby('x')['color'].apply(list)

out = groups[groups.apply(lambda x: len(set(x))) > 1]

编辑:要查看匹配的值,可以使用set s 和过滤器长度:

groups = df.groupby('x')['color'].apply(set)
print (groups)
x
A            {blue}
B    {yellow, blue}
C           {green}
Name: color, dtype: object

out = groups[groups.str.len() > 1]
print (out)
x
B    {yellow, blue}
Name: color, dtype: object

或者非常相似,首先转换为集合,然后转换为列表:

groups = df.groupby('x')['color'].apply(lambda x: list(set(x)))
print (groups)
x
A            [blue]
B    [yellow, blue]
C           [green]
Name: color, dtype: object

out = groups[groups.str.len() > 1]
print (out)
x
B    [yellow, blue]
Name: color, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM