[英]Pandas: Aggregating and transposing dataframe based on string
我有一個 dataframe 跟蹤由 id 標識的 object 的變化。 而不是每一行代表 state 的變化,我希望每個 object 有一行,並且所有變化都在列中跟蹤。
import pandas as pd
import numpy as np
df1=pd.DataFrame({'ID':['1','2','3','1','2','1','4'], 'Original_Status':['Admitted','Admitted','Admitted','Probation','LateAdmission','Admitted','Admitted'],'New_Status':['Probation','LateAdmission','Pass','Admitted','Pass','Pass','Fail']})
df2=pd.DataFrame({'ID':['1','2','3','4'],'Original_Status_1':['Admitted','Admitted','Admitted','Admitted'],'New_Status_1':['Probation','LateAdmission','Pass','Fail'],'Original_Status_2':['Probation','LateAdmission',np.nan,np.nan],'New_Status_2':['Admitted','Pass',np.nan,np.nan],'Original_Status_3':['Admitted',np.nan,np.nan,np.nan],'New_Status_3':['Pass',np.nan,np.nan,np.nan],})`
ID Original_Status New_Status
0 1 Admitted Probation
1 2 Admitted LateAdmission
2 3 Admitted Pass
3 1 Probation Admitted
4 2 LateAdmission Pass
5 1 Admitted Pass
6 4 Admitted Fail
改成:
ID Original_Status_1 New_Status_1 Original_Status_2 New_Status_2 Original_Status_3 New_Status_3
0 1 Admitted Probation Probation Admitted Admitted Pass
1 2 Admitted LateAdmission LateAdmission Pass NaN NaN
2 3 Admitted Pass NaN NaN NaN NaN
3 4 Admitted Fail NaN NaN NaN NaN
我能夠使用循環來實現這個結果,但如果可能的話,我更喜歡更簡潔的解決方案。
這是一個丑陋的方法。 它使用列表理解一次創建 df2 的每一行,然后添加列標題
df2 = pd.concat(
[
pd.Series(data=g.set_index('ID').values.flatten(),name=i)
for i,g in df1.groupby('ID')
],
axis=1,
).T.reset_index()
c_names = [s.format(i+1) for i in range(df2.shape[1]//2) for s in ['Original_Status_{}','New_Status_{}']]
df2.columns = ['ID']+c_names
df2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.