[英]Pandas groupby transpose
I have a file from SAP which wasn't the prettiest one when dealing with data. 我有一个来自SAP的文件,在处理数据时并不是最漂亮的文件。 So using
series.str.contains()
and boolean masks I have managed to narrow down to a dataframe looking like below: 所以使用
series.str.contains()
和boolean mask我已经设法缩小到如下所示的数据帧:
0 1
0 SUB 123
1 CAT SKU
2 CODE 1000123
3 CODE 1000234
4 SUB 456
5 CAT LIQ
6 CODE1 1000345
7 CODE1 1000534
8 CODE1 1000433
I am looking for a way where I can separate each SUB
into a new entry like below: 我正在寻找一种方法,我可以将每个
SUB
分成如下所示的新条目:
print(expected_df)
SUB CAT CODE CODE1
0 123 SKU 1000123.0 NaN
1 123 SKU 1000234.0 NaN
2 456 LIQ NaN 1000345.0
3 456 LIQ NaN 1000534.0
4 456 LIQ NaN 1000433.0
I just cant seem to get pass this step. 我似乎无法通过这一步。 However, this line:
但是,这一行:
df[0].eq('SUB').cumsum()
helps to segregate the groups and can be used as a helper series if needed. 有助于隔离组,如果需要可以用作辅助系列。
Any help in transposing the data as shown would be really appreciated. 如图所示转置数据的任何帮助都将非常感激。
Thanks. 谢谢。
IIUC, IIUC,
df.set_index('col1').groupby(df.col1.eq('SUB').cumsum().values).apply(lambda s: pd.DataFrame({
'SUB': s.loc['SUB'].item(),
'CAT': s.loc['CAT'].item(),
s.index[2]: s.loc[s.index[2]].col2.tolist()
})).reset_index(drop=True)
Outputs 输出
SUB CAT CODE CODE1
0 123 SKU 1000123 NaN
1 123 SKU 1000234 NaN
2 456 LIQ NaN 1000345
3 456 LIQ NaN 1000534
4 456 LIQ NaN 1000433
However, this looks like an XY problem. 但是,这看起来像XY问题。 Maybe it's worth taking a look into how you ended up with this
df
in the first place 也许值得一看,你最初是如何结束这个
df
的
IIUC IIUC
l=[y.set_index('0').T.set_index(['SUB','CAT']).stack() for x , y in df.groupby(df['0'].eq('SUB').cumsum())]
s=pd.concat(l).to_frame('v')
s.assign(key=s.groupby(level=[0,1,2]).cumcount()).set_index('key',append=True).unstack(2)
v
0 CODE CODE1
SUB CAT key
123 SKU 0 1000123 NaN
1 1000234 NaN
456 LIQ 0 NaN 1000345
1 NaN 1000534
2 NaN 1000433
You can try of using df.pivot
followed by .ffill(),bfill() for the specific 'SUB' column group rows. 您可以尝试使用
df.pivot
然后使用.ffill(),bfill()来表示特定的“SUB”列组行。
df1 = df.pivot(columns='0')
df1.columns = df1.columns.map(lambda x: x[1])
df1.SUB = df1.SUB.ffill()
df1.groupby('SUB').ffill().groupby('SUB').bfill().drop_duplicates()
#5.89 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# as time constraints, without use of lambda operation
#df1.groupby(df1.SUB.ffill()).apply(lambda x: x.ffill().bfill()).drop_duplicates()
#16 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Out: 日期:
SUB CAT CODE CODE1 SUB
2 123 SKU 1000123 NaN 123
3 123 SKU 1000234 NaN 123
6 456 LIQ NaN 1000345 456
7 456 LIQ NaN 1000534 456
8 456 LIQ NaN 1000433 456
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.