[英]How do I check if all values in a column of a pandas dataframe are equal?
I have a dataframe like this我有一个像这样的 dataframe
name data result
0 x 100
1 x 100
2 x 100
3 x 100
4 x 100
5 y 100
6 y 90
7 y 90
8 y 100
9 y 85
I want to check whether each group in the name
column have the same value in the data
column.我想检查name
列中的每个组在data
列中是否具有相同的值。
So for each x
group, if the corresponding data
value are all equal, write full
in the result
column.所以对于每个x
组,如果对应的data
值都相等,则在result
列中full
。 If the values for a group not are all equal, write nearly
in the result
column.如果一个组的值不是全部相等,请在result
列中写入nearly
。
I have tried grouping the dataframe:我尝试将 dataframe 分组:
dfx = df.groupby('name')
dfx = dfa.get_group('x')
but it doesn't really help in checking if each value is the same, write in the result
column.但它并不能真正帮助检查每个值是否相同,请写入result
列。
I have tried creating a function that will check for unique values我尝试创建一个 function 来检查唯一值
def check_identicals(row):
if(df.sent.nunique() == 1):
print('Full')
The idea here is to then apply that function to each row and write the output in the result
column.这里的想法是然后将 function 应用到每一行,并在result
列中写入 output。
Ideal output:理想output:
name data result
0 x 100 full
1 x 100 full
2 x 100 full
3 x 100 full
4 x 100 full
5 y 100 nearly
6 y 90 nearly
7 y 90 nearly
8 y 100 nearly
9 y 85 nearly
Use numpy.where
with GroupBy.transform
and DataFrameGroupBy.nunique
for compare all values in new Series
with same size like original DataFrame
:使用numpy.where
与GroupBy.transform
和DataFrameGroupBy.nunique
比较新Series
中与原始DataFrame
相同大小的所有值:
df['result'] = np.where(df.groupby('name')['data'].transform('nunique') == 1,'full','nearly')
print (df)
name data result
0 x 100 full
1 x 100 full
2 x 100 full
3 x 100 full
4 x 100 full
5 y 100 nearly
6 y 90 nearly
7 y 90 nearly
8 y 100 nearly
9 y 85 nearly
EDIT:编辑:
For test if all missing values per groups use numpy.select
with another condition with compare mising values with transform
and GroupBy.all
:为了测试每组的所有缺失值是否使用numpy.select
和另一个条件,将缺失值与transform
和GroupBy.all
进行比较:
m1 = df.groupby('name')['data'].transform('nunique') == 1
m2 = df['data'].isna().groupby(df['name']).transform('all')
df['result'] = np.select([m1, m2], ['full', 'all_missing'],'nearly')
print (df)
name data result
0 x 100.0 full
1 x 100.0 full
2 x 100.0 full
3 x 100.0 full
4 x 100.0 full
5 y 100.0 nearly
6 y 90.0 nearly
7 y 90.0 nearly
8 z NaN all_missing
9 z NaN all_missing
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.