在特定值之后将数据框拆分为多个数据框

Question

I am trying to split a dataframe in the below format into multiple dataframes based on specific value. 我正在尝试根据特定值将以下格式的数据帧拆分为多个数据帧。

Column0              Column1     Column2  Column3
Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64
Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

In the above dataframe,I want to split the rows starting from Event2 or Code = 30 (specific color code or Header code) into separate dataframe and rest (which are above) into other dataframe (There may be more than two events also.). 在上面的数据框中，我想将从Event2或Code = 30（特定颜色代码或标题代码）开始的行拆分为单独的数据帧，并将其余（以上）的行拆分为其他数据帧（也可能有两个以上的事件）。。

I have tried few codes but most of them are for filtering purpose. 我尝试了很少的代码，但大多数都是出于过滤目的。

Expected output is: Dataframe1: 预期输出为：Dataframe1：

Question             Answer      Reason   30
It is received?      XXX         YYY      27
Deducted             FDF         RES      64
Transferred?         WWW         RRR      64

Dataframe2: Dataframe2：

Transport Services   Passgener   Carrier  30
Distance             KKK         WDF      27
Return               PPP         LMN      64

Please help as I am new to python. 请帮忙，因为我是python新手。

Answer 1

You could groupby on a temporary helper column based on Code to separate out different DataFrames and add these to a dictionary . 你可以groupby上的临时助手column基于Code分离出不同的DataFrames并将其添加到dictionary 。 I'm assuming you meant the data to actually look like this to match the original schema: 我假设您的意思是数据实际上看起来像这样以匹配原始架构：

             Question     Answer   Reason  Code
0     It is received?        XXX      YYY    30
1            Deducted        FDF      RES    64
2        Transferred?        WWW      RRR    64
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

If so, you can do: 如果是这样，您可以执行以下操作：

df['tmp'] = df.apply(lambda x: x.Question if x.Code == 30 else np.nan, axis=1).fillna(method='ffill')

to get: 要得到：

             Question     Answer   Reason  Code                 tmp
0     It is received?        XXX      YYY    30     It is received?
1            Deducted        FDF      RES    64     It is received?
2        Transferred?        WWW      RRR    64     It is received?
3  Transport Services  Passgener  Carrier    30  Transport Services
4            Distance        KKK      WDF    27  Transport Services
5              Return        PPP      LMN    64  Transport Services

From here, you can enumerate the groups in the tmp columns and add the result to a dictionary with integer keys : 在这里，您可以enumerate tmp列中的groups ，并将结果添加到带有integer keys的dictionary ：

questions = {}
for e, (event, data) in enumerate(df.groupby('tmp')):
    questions[e] = data.drop('tmp', axis=1)

questions[0]

          Question Answer Reason  Code
0  It is received?    XXX    YYY    30
1         Deducted    FDF    RES    64
2     Transferred?    WWW    RRR    64

questions[1]

             Question     Answer   Reason  Code
3  Transport Services  Passgener  Carrier    30
4            Distance        KKK      WDF    27
5              Return        PPP      LMN    64

在特定值之后将数据框拆分为多个数据框

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-01-14 10:23:40

在特定值之后将数据框拆分为多个数据框

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-01-14 10:23:40

解决方案1
0 已采纳 2016-01-14 10:23:40