简体   繁体   中英

python: check dataframe columns: is there more than one value for each group?

the following code:

import numpy as np
import pandas as pd

data=[['A', 1,2 ,5, 'blue'],
        ['A', 5,5,6, 'blue'],
        ['A', 4,6,7, 'blue']
        ,['B', 6,5,4,'yellow'],
        ['B',9,9,3, 'blue'],
        ['B', 7,9,1,'yellow']
        ,['B', 2,3,1,'yellow'],
        ['B', 5,1,2,'yellow'],
        ['C',2,10,9,'green']
        ,['C', 8,2,8,'green'],
        ['C', 5,4,3,'green'],
        ['C', 8,5 ,3,'green']]
df = pd.DataFrame(data, columns=['x','y','z','xy', 'color'])

groups = df.groupby('x')['color'].apply(list)
print(groups)

produces the following output:

x
A                        [blue, blue, blue]
B    [yellow, blue, yellow, yellow, yellow]
C              [green, green, green, green]
Name: color, dtype: object

I now want to check if there is more than one category for each 'x' value. For example, A has only one category but B has two. I am not sure if there is a way to do that.

Use DataFrameGroupBy.nunique for unique values per groups and then filter index values of Series greater like 1 :

s = df.groupby('x')['color'].nunique()

x = s.index[s > 1].tolist()

Your code should be changed by add filter length of unique values:

groups = df.groupby('x')['color'].apply(list)

out = groups[groups.apply(lambda x: len(set(x))) > 1]

EDIT: For see matched values is possible use set s and filter length:

groups = df.groupby('x')['color'].apply(set)
print (groups)
x
A            {blue}
B    {yellow, blue}
C           {green}
Name: color, dtype: object

out = groups[groups.str.len() > 1]
print (out)
x
B    {yellow, blue}
Name: color, dtype: object

Or very similar first convert to sets and then to lists:

groups = df.groupby('x')['color'].apply(lambda x: list(set(x)))
print (groups)
x
A            [blue]
B    [yellow, blue]
C           [green]
Name: color, dtype: object

out = groups[groups.str.len() > 1]
print (out)
x
B    [yellow, blue]
Name: color, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM