[英]Collapsing rows with NaN entries in pandas dataframe
我有一個帶有數據行的pandas DataFrame ::
# objectID grade OS method
object_id_0001 AAA Mac organic
object_id_0001 AAA Mac NA
object_id_0001 AAA NA organic
object_id_0002 NA NA NA
object_id_0002 ABC Win NA
即,同一對象ID通常有多個條目,但有時/通常條目具有NA。
因此,我只是在尋找一種可以在ObjectID上組合的方法,並報告非NA條目,例如上面的折疊到::
object_id_0001 AAA Mac organic
object_id_0002 ABC Win NA
這有效並且已經有很長一段時間了。 但是,有些人聲稱這是一個可能修復的錯誤。 正如當前實現的那樣,如果每列存在,則first
返回第一個非null元素。
df.groupby('objectID', as_index=False).first()
objectID grade OS method
0 object_id_0001 AAA Mac organic
1 object_id_0002 ABC Win NaN
pd.concat
pd.concat([
pd.DataFrame([d.lookup(d.notna().idxmax(), d.columns)], columns=d.columns)
for _, d in df.groupby('objectID')
], ignore_index=True)
objectID grade OS method
0 object_id_0001 AAA Mac organic
1 object_id_0002 ABC Win NaN
stack
df.set_index('objectID').stack().groupby(level=[0, 1]).head(1).unstack()
grade OS method
objectID
object_id_0001 AAA Mac organic
object_id_0002 ABC Win None
如果偶然那些是字符串( 'NA'
)
df.mask(df.astype(str).eq('NA')).groupby('objectID', as_index=False).first()
一種替代方案,更具機械性
def aggregate(s):
u = s[s.notnull()].unique()
if not u.size: return np.nan
return u
df.groupby('objectID').agg(aggregate)
grade OS method
objectID
object_id_0001 AAA Mac organic
object_id_0002 ABC Win NaN
這將工作bfill
+ drop_duplicates
df.groupby('objectID',as_index=False).bfill().drop_duplicates('objectID')
Out[939]:
objectID grade OS method
0 object_id_0001 AAA Mac organic
3 object_id_0002 ABC Win NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.