繁体   English   中英

在特定值之后将数据框拆分为多个数据框

[英]Split Dataframe into multiple dataframes after a specific value

我正在尝试根据特定值将以下格式的数据帧拆分为多个数据帧。

Column0              Column1     Column2  Column3
Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64
Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

在上面的数据框中,我想将从Event2或Code = 30(特定颜色代码或标题代码)开始的行拆分为单独的数据帧,并将其余(以上)的行拆分为其他数据帧(也可能有两个以上的事件)。 。

我尝试了很少的代码,但大多数都是出于过滤目的。

预期输出为:Dataframe1:

Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64

Dataframe2:

Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

请帮忙,因为我是python新手。

你可以groupby上的临时助手column基于Code分离出不同的DataFrames并将其添加到dictionary 我假设您的意思是数据实际上看起来像这样以匹配原始架构:

             Question     Answer   Reason  Code
0     It is received?        XXX      YYY    30
1            Deducted        FDF      RES    64
2        Transferred?        WWW      RRR    64
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

如果是这样,您可以执行以下操作:

df['tmp'] = df.apply(lambda x: x.Question if x.Code == 30 else np.nan, axis=1).fillna(method='ffill')

要得到:

             Question     Answer   Reason  Code                 tmp
0     It is received?        XXX      YYY    30     It is received?
1            Deducted        FDF      RES    64     It is received?
2        Transferred?        WWW      RRR    64     It is received?
3  Transport Services  Passgener  Carrier    30  Transport Services
4            Distance        KKK      WDF    27  Transport Services
5              Return        PPP      LMN    64  Transport Services

在这里,您可以enumerate tmp列中的groups ,并将结果添加到带有integer keysdictionary

questions = {}
for e, (event, data) in enumerate(df.groupby('tmp')):
    questions[e] = data.drop('tmp', axis=1)

questions[0]

          Question Answer Reason  Code
0  It is received?    XXX    YYY    30
1         Deducted    FDF    RES    64
2     Transferred?    WWW    RRR    64

questions[1]

             Question     Answer   Reason  Code
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM