[英]Python Docx - How check if each table contain more than 5 columns
[英]python: check dataframe columns: is there more than one value for each group?
以下代码:
import numpy as np
import pandas as pd
data=[['A', 1,2 ,5, 'blue'],
['A', 5,5,6, 'blue'],
['A', 4,6,7, 'blue']
,['B', 6,5,4,'yellow'],
['B',9,9,3, 'blue'],
['B', 7,9,1,'yellow']
,['B', 2,3,1,'yellow'],
['B', 5,1,2,'yellow'],
['C',2,10,9,'green']
,['C', 8,2,8,'green'],
['C', 5,4,3,'green'],
['C', 8,5 ,3,'green']]
df = pd.DataFrame(data, columns=['x','y','z','xy', 'color'])
groups = df.groupby('x')['color'].apply(list)
print(groups)
产生以下输出:
x
A [blue, blue, blue]
B [yellow, blue, yellow, yellow, yellow]
C [green, green, green, green]
Name: color, dtype: object
我现在想检查每个“x”值是否有多个类别。 例如,A 只有一个类别,而 B 有两个类别。 我不确定是否有办法做到这一点。
将DataFrameGroupBy.nunique
用于每个组的唯一值,然后过滤Series
更大的index
值,例如1
:
s = df.groupby('x')['color'].nunique()
x = s.index[s > 1].tolist()
您的代码应该通过添加唯一值的过滤器长度来更改:
groups = df.groupby('x')['color'].apply(list)
out = groups[groups.apply(lambda x: len(set(x))) > 1]
编辑:要查看匹配的值,可以使用set
s 和过滤器长度:
groups = df.groupby('x')['color'].apply(set)
print (groups)
x
A {blue}
B {yellow, blue}
C {green}
Name: color, dtype: object
out = groups[groups.str.len() > 1]
print (out)
x
B {yellow, blue}
Name: color, dtype: object
或者非常相似,首先转换为集合,然后转换为列表:
groups = df.groupby('x')['color'].apply(lambda x: list(set(x)))
print (groups)
x
A [blue]
B [yellow, blue]
C [green]
Name: color, dtype: object
out = groups[groups.str.len() > 1]
print (out)
x
B [yellow, blue]
Name: color, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.