[英]Split Dataframe into multiple dataframes after a specific value
I am trying to split a dataframe in the below format into multiple dataframes based on specific value. 我正在尝试根据特定值将以下格式的数据帧拆分为多个数据帧。
Column0 Column1 Column2 Column3
Question Answer Reason 30
It is received? XXX YYY 27
Deducted FDF RES 64
Transferred? WWW RRR 64
Transport Services Passgener Carrier 30
Distance KKK WDF 27
Return PPP LMN 64
In the above dataframe,I want to split the rows starting from Event2 or Code = 30 (specific color code or Header code) into separate dataframe and rest (which are above) into other dataframe (There may be more than two events also.). 在上面的数据框中,我想将从Event2或Code = 30(特定颜色代码或标题代码)开始的行拆分为单独的数据帧,并将其余(以上)的行拆分为其他数据帧(也可能有两个以上的事件)。 。
I have tried few codes but most of them are for filtering purpose. 我尝试了很少的代码,但大多数都是出于过滤目的。
Expected output is: Dataframe1: 预期输出为:Dataframe1:
Question Answer Reason 30
It is received? XXX YYY 27
Deducted FDF RES 64
Transferred? WWW RRR 64
Dataframe2: Dataframe2:
Transport Services Passgener Carrier 30
Distance KKK WDF 27
Return PPP LMN 64
Please help as I am new to python. 请帮忙,因为我是python新手。
You could groupby
on a temporary helper column
based on Code
to separate out different DataFrames
and add these to a dictionary
. 你可以
groupby
上的临时助手column
基于Code
分离出不同的DataFrames
并将其添加到dictionary
。 I'm assuming you meant the data to actually look like this to match the original schema: 我假设您的意思是数据实际上看起来像这样以匹配原始架构:
Question Answer Reason Code
0 It is received? XXX YYY 30
1 Deducted FDF RES 64
2 Transferred? WWW RRR 64
3 Transport Services Passgener Carrier 30
4 Distance KKK WDF 27
5 Return PPP LMN 64
If so, you can do: 如果是这样,您可以执行以下操作:
df['tmp'] = df.apply(lambda x: x.Question if x.Code == 30 else np.nan, axis=1).fillna(method='ffill')
to get: 要得到:
Question Answer Reason Code tmp
0 It is received? XXX YYY 30 It is received?
1 Deducted FDF RES 64 It is received?
2 Transferred? WWW RRR 64 It is received?
3 Transport Services Passgener Carrier 30 Transport Services
4 Distance KKK WDF 27 Transport Services
5 Return PPP LMN 64 Transport Services
From here, you can enumerate
the groups
in the tmp
columns and add the result to a dictionary
with integer
keys
: 在这里,您可以
enumerate
tmp
列中的groups
,并将结果添加到带有integer
keys
的dictionary
:
questions = {}
for e, (event, data) in enumerate(df.groupby('tmp')):
questions[e] = data.drop('tmp', axis=1)
questions[0]
Question Answer Reason Code
0 It is received? XXX YYY 30
1 Deducted FDF RES 64
2 Transferred? WWW RRR 64
questions[1]
Question Answer Reason Code
3 Transport Services Passgener Carrier 30
4 Distance KKK WDF 27
5 Return PPP LMN 64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.