[英]pandas groupby aggregate keep equal values
我正在尝试构建一个聚合器,如果它等于变量中的所有其他值,则简单地返回一个值,否则返回NaN。
在聚合感官数据的同时保留元信息。
我收到一个奇怪的按键错误...
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({'v1' : [1,1,1,2,2,2],
'v2' : [1,2,3,4,5,6],
'v3' : [1,1,1,2,3,2],
'v4' : [2,2,2,3,3,3]})
def keep_equal(x):
if (x == x[0]).all(): return x[0]
else: return np.NaN
df = df.groupby(df["v1"], as_index=False, observed =True).agg(keep_equal)
预期输出为:
v1 v2 v3 v4
0 1 NaN 1 2
1 2 NaN NaN 3
但是我收到一个关键错误:
Traceback (most recent call last):
File "pandas\_libs\index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
您需要使用iloc
检查位置
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({'v1' : [1,1,1,2,2,2],
'v2' : [1,2,3,4,5,6],
'v3' : [1,1,1,2,3,2],
'v4' : [2,2,2,3,3,3]})
def keep_equal(x):
if (x == x.iloc[0]).all(): return x.iloc[0]
else: return np.NaN
df = df.groupby(df["v1"], as_index=False, observed =True).agg(keep_equal)
print(df)
>>
v1 v2 v3 v4
0 1 NaN 1.0 2
1 2 NaN NaN 3
如果性能更重要,请使用Series.iat
在此处选择Series
第一个值:
df = pd.DataFrame.from_dict({'v1' : [1,1,1,2,2,2],
'v2' : [1,2,3,4,5,6],
'v3' : [1,1,1,2,3,2],
'v4' : [2,2,2,3,3,3]})
def keep_equal(x):
if (x == x.iat[0]).all():
return x.iat[0]
else:
return np.NaN
或使用一1d
numpy数组:
def keep_equal(x):
if (x == x.values[0]).all():
return x.values[0]
else:
return np.NaN
df = df.groupby(df["v1"], as_index=False).agg(keep_equal)
print (df)
v1 v2 v3 v4
0 1 NaN 1.0 2
1 2 NaN NaN 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.