簡體   English   中英

在特定值之后將數據框拆分為多個數據框

[英]Split Dataframe into multiple dataframes after a specific value

我正在嘗試根據特定值將以下格式的數據幀拆分為多個數據幀。

Column0              Column1     Column2  Column3
Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64
Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

在上面的數據框中,我想將從Event2或Code = 30(特定顏色代碼或標題代碼)開始的行拆分為單獨的數據幀,並將其余(以上)的行拆分為其他數據幀(也可能有兩個以上的事件)。 。

我嘗試了很少的代碼,但大多數都是出於過濾目的。

預期輸出為:Dataframe1:

Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64

Dataframe2:

Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

請幫忙,因為我是python新手。

你可以groupby上的臨時助手column基於Code分離出不同的DataFrames並將其添加到dictionary 我假設您的意思是數據實際上看起來像這樣以匹配原始架構:

             Question     Answer   Reason  Code
0     It is received?        XXX      YYY    30
1            Deducted        FDF      RES    64
2        Transferred?        WWW      RRR    64
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

如果是這樣,您可以執行以下操作:

df['tmp'] = df.apply(lambda x: x.Question if x.Code == 30 else np.nan, axis=1).fillna(method='ffill')

要得到:

             Question     Answer   Reason  Code                 tmp
0     It is received?        XXX      YYY    30     It is received?
1            Deducted        FDF      RES    64     It is received?
2        Transferred?        WWW      RRR    64     It is received?
3  Transport Services  Passgener  Carrier    30  Transport Services
4            Distance        KKK      WDF    27  Transport Services
5              Return        PPP      LMN    64  Transport Services

在這里,您可以enumerate tmp列中的groups ,並將結果添加到帶有integer keysdictionary

questions = {}
for e, (event, data) in enumerate(df.groupby('tmp')):
    questions[e] = data.drop('tmp', axis=1)

questions[0]

          Question Answer Reason  Code
0  It is received?    XXX    YYY    30
1         Deducted    FDF    RES    64
2     Transferred?    WWW    RRR    64

questions[1]

             Question     Answer   Reason  Code
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM