I'm trying to take a format starting with:
Global_Code | Retailer_X_Code | Retailer_Y_Code | Info | ...
-----------------------------------------------------------
'A' 'a' 'a_a' 1 ...
'B' 'b' 'b_b' 2 ...
... ... ... ... ...
And stack the Retailer_X_Code
and Retailer_Y_Code
into a single Retailer_Name
column, indexed by Global_Code
. I'd also like to keep other columns in the row such as Info
.
So starting off with stack()
, I get:
stacked_df = mapping_df.stack()
========
Global_Code 'A'
Retailer_X_Code 'a'
Retailer_Y_Code 'a_a'
Info 1
...more columns
Global_Code 'B'
Retailer_X_Code 'b'
Retailer_Y_Code 'b_b'
Info 2
...more columns
Great, now I don't all the columns, and want Retailer_X_Code
and Retailer_Y_Code
to be under one column. So I select these columns, organized by Global_Code
:
stacked_df = mapping_df[['Global_Code', 'Retailer_X_Code', 'Retailer_Y_Code']].set_index('Global_Code').stack().reset_index().rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Code'})
========
Global_Code | Retailer_Name | Retailer_Code
------------------------------------------
'A' 'Retailer_X_Code' 'a'
'A' 'Retailer_Y_Code' 'a_a'
... ... ...
'B' 'Retailer_X_Code' 'b'
'B' 'Retailer_Y_Code' 'b_b'
So far so good . Now I want to grab Info
and include is as part of the stacked column result. The desired output should look like this:
Global_Code | Retailer_Name | Retailer_Code | Info
------------------------------------------------------
'A' 'Retailer_X_Code' 'a' 1
'A' 'Retailer_Y_Code' 'a_a' 1
... ... ... ...
'B' 'Retailer_X_Code' 'b' 2
'B' 'Retailer_Y_Code' 'b_b' 2
But if I add Info
as part of the selected columns, then rename it to Product_Info
, it doesn't add the Info
column.
Instead, it inserts Info
values incorrectly under Retailer_Name
.
stacked_df = mapping_df[['Global_Code', 'Retailer_X_Code', 'Retailer_Y_Code', 'Info']].set_index('Global_Code').stack().reset_index().rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Code', 1: 'Product_Info'})
========
Global_Code | Retailer_Name | Retailer_Code
------------------------------------------
'A' 'Retailer_X_Code' 'a'
'A' 'Retailer_Y_Code' 'a_a'
'A' 'Info' 1
'A' 'Info' 1
'A' 'Info' 1
... ... ...
'B' 'Retailer_X_Code' 'b'
'B' 'Retailer_Y_Code' 'b_b'
'B' 'Info' 2
The above without column renaming, .rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Pack'})
, gives me:
Global_Code | level_1 | 0
------------------------------------------
'A' 'Retailer_X_Code' 'a'
'A' 'Retailer_Y_Code' 'a_a'
'A' 'Info' 1
'A' 'Info' 1
'A' 'Info' 1
... ... ...
'B' 'Retailer_X_Code' 'b'
'B' 'Retailer_Y_Code' 'b_b'
'B' 'Info' 2
The way to do it is index
by the columns you need: .set_index(['Index1', 'Index2'])
Eg:
stacked_df = mapping_df[['Global_Code', 'Retailer_X_Code', 'Retailer_Y_Code', 'Info']].set_index(['Global_Code', 'Info']).stack().reset_index().rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Code', 1: 'Product_Info'})
Gives:
Global_Code | Info | Retailer_Name | Retailer_Code
------------------------------------------------------
'A' 1 'Retailer_X_Code' 'a'
'A' 1 'Retailer_Y_Code' 'a_a'
... ... ...
'B' 2 'Retailer_X_Code' 'b'
'B' 2 'Retailer_Y_Code' 'b_b'
We using wide_to_long
..:-), if you want to change the column name you can do , rename
..
pd.wide_to_long(df,stubnames='Retailer',i=['Global_Code','Info'],j='Retailer_Name',sep='_',suffix='\\w+').reset_index()
Out[155]:
Global_Code Info Retailer_Name Retailer
0 'A' 1 X_Code 'a'
1 'A' 1 Y_Code 'a_a'
2 'B' 2 X_Code 'b'
3 'B' 2 Y_Code 'b_b'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.