簡體   English   中英

Pandas:基於字符串聚合和轉置dataframe

[英]Pandas: Aggregating and transposing dataframe based on string

我有一個 dataframe 跟蹤由 id 標識的 object 的變化。 而不是每一行代表 state 的變化,我希望每個 object 有一行,並且所有變化都在列中跟蹤。

import pandas as pd
import numpy as np
df1=pd.DataFrame({'ID':['1','2','3','1','2','1','4'], 'Original_Status':['Admitted','Admitted','Admitted','Probation','LateAdmission','Admitted','Admitted'],'New_Status':['Probation','LateAdmission','Pass','Admitted','Pass','Pass','Fail']})

df2=pd.DataFrame({'ID':['1','2','3','4'],'Original_Status_1':['Admitted','Admitted','Admitted','Admitted'],'New_Status_1':['Probation','LateAdmission','Pass','Fail'],'Original_Status_2':['Probation','LateAdmission',np.nan,np.nan],'New_Status_2':['Admitted','Pass',np.nan,np.nan],'Original_Status_3':['Admitted',np.nan,np.nan,np.nan],'New_Status_3':['Pass',np.nan,np.nan,np.nan],})`

    ID  Original_Status     New_Status
0   1   Admitted            Probation
1   2   Admitted            LateAdmission
2   3   Admitted            Pass
3   1   Probation           Admitted
4   2   LateAdmission       Pass
5   1   Admitted            Pass
6   4   Admitted            Fail

原裝 Dataframe

改成:

    ID  Original_Status_1  New_Status_1  Original_Status_2   New_Status_2  Original_Status_3  New_Status_3
0   1   Admitted           Probation     Probation           Admitted      Admitted           Pass
1   2   Admitted           LateAdmission LateAdmission       Pass          NaN                NaN
2   3   Admitted           Pass          NaN                 NaN           NaN                NaN
3   4   Admitted           Fail          NaN                 NaN           NaN                NaN

全新 Dataframe

我能夠使用循環來實現這個結果,但如果可能的話,我更喜歡更簡潔的解決方案。

這是一個丑陋的方法。 它使用列表理解一次創建 df2 的每一行,然后添加列標題

df2 = pd.concat(
    [
        pd.Series(data=g.set_index('ID').values.flatten(),name=i)
        for i,g in df1.groupby('ID')
    ],
    axis=1,
).T.reset_index()

c_names = [s.format(i+1) for i in range(df2.shape[1]//2) for s in ['Original_Status_{}','New_Status_{}']]

df2.columns = ['ID']+c_names
df2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM