简体   繁体   English

使用堆栈按单个列索引多个列

[英]using stack to index multiple columns by a single column

I'm trying to take a format starting with: 我正在尝试采用以下格式:

 Global_Code | Retailer_X_Code | Retailer_Y_Code | Info | ...
  -----------------------------------------------------------
 'A'              'a'              'a_a'            1     ...
 'B'              'b'              'b_b'            2     ...
 ...              ...               ...            ...    ...

And stack the Retailer_X_Code and Retailer_Y_Code into a single Retailer_Name column, indexed by Global_Code . 并将Retailer_X_CodeRetailer_Y_Code堆叠到一个由Global_Code索引的单个Retailer_Name列中。 I'd also like to keep other columns in the row such as Info . 我还想将其他列保留在该行中,例如Info

So starting off with stack() , I get: 因此,从stack() ,我得到:

 stacked_df = mapping_df.stack()

 ========

 Global_Code          'A'
 Retailer_X_Code      'a'
 Retailer_Y_Code      'a_a'
 Info                 1
 ...more columns

 Global_Code          'B'
 Retailer_X_Code      'b'
 Retailer_Y_Code      'b_b'
 Info                 2
 ...more columns

Great, now I don't all the columns, and want Retailer_X_Code and Retailer_Y_Code to be under one column. 太好了,现在我没有所有的列,而是希望Retailer_X_CodeRetailer_Y_Code放在一列之内。 So I select these columns, organized by Global_Code : 因此,我选择了这些列(按Global_Code组织):

stacked_df = mapping_df[['Global_Code', 'Retailer_X_Code', 'Retailer_Y_Code']].set_index('Global_Code').stack().reset_index().rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Code'})

========

Global_Code |  Retailer_Name   | Retailer_Code
------------------------------------------      
'A'           'Retailer_X_Code'   'a'         
'A'           'Retailer_Y_Code'   'a_a'          
...           ...                 ...              
'B'           'Retailer_X_Code'   'b'           
'B'           'Retailer_Y_Code'   'b_b'         

So far so good . 到目前为止一切顺利 Now I want to grab Info and include is as part of the stacked column result. 现在,我想获取Info并将include作为堆叠列结果的一部分。 The desired output should look like this: 所需的输出应如下所示:

Global_Code |  Retailer_Name   | Retailer_Code | Info
------------------------------------------------------      
'A'           'Retailer_X_Code'   'a'            1
'A'           'Retailer_Y_Code'   'a_a'          1   
...           ...                 ...           ...    
'B'           'Retailer_X_Code'   'b'            2
'B'           'Retailer_Y_Code'   'b_b'          2

But if I add Info as part of the selected columns, then rename it to Product_Info , it doesn't add the Info column. 但是,如果我将Info作为所选列的一部分添加,然后将其重命名为Product_Info ,则不会添加Info列。

Instead, it inserts Info values incorrectly under Retailer_Name . 而是,它在Retailer_Name下错误地插入了Info值。

stacked_df = mapping_df[['Global_Code', 'Retailer_X_Code', 'Retailer_Y_Code', 'Info']].set_index('Global_Code').stack().reset_index().rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Code', 1: 'Product_Info'})

========

Global_Code |  Retailer_Name   | Retailer_Code
------------------------------------------      
'A'           'Retailer_X_Code'   'a'
'A'           'Retailer_Y_Code'   'a_a'
'A'           'Info'              1   
'A'           'Info'              1
'A'           'Info'              1            
...           ...                 ...
'B'           'Retailer_X_Code'   'b'
'B'           'Retailer_Y_Code'   'b_b'
'B'           'Info'              2

The above without column renaming, .rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Pack'}) , gives me: 上面没有列重命名的.rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Pack'})给我:

Global_Code |  level_1          | 0
------------------------------------------      
'A'           'Retailer_X_Code'   'a'
'A'           'Retailer_Y_Code'   'a_a'
'A'           'Info'              1   
'A'           'Info'              1
'A'           'Info'              1            
...           ...                 ...
'B'           'Retailer_X_Code'   'b'
'B'           'Retailer_Y_Code'   'b_b'
'B'           'Info'              2

The way to do it is index by the columns you need: .set_index(['Index1', 'Index2']) 做到这一点的方法是按需要的列进行index.set_index(['Index1', 'Index2'])

Eg: 例如:

stacked_df = mapping_df[['Global_Code', 'Retailer_X_Code', 'Retailer_Y_Code', 'Info']].set_index(['Global_Code', 'Info']).stack().reset_index().rename(columns={'level_1':'Retailer_Name', 0:'Retailer_Code', 1: 'Product_Info'})

Gives: 得到:

Global_Code |  Info | Retailer_Name   | Retailer_Code 
------------------------------------------------------      
'A'              1    'Retailer_X_Code'   'a'           
'A'              1    'Retailer_Y_Code'   'a_a'           
...                   ...                 ...              
'B'              2    'Retailer_X_Code'   'b'           
'B'              2    'Retailer_Y_Code'   'b_b'         

We using wide_to_long ..:-), if you want to change the column name you can do , rename .. 我们使用wide_to_long .. :-),如果要更改列名,可以rename ..

pd.wide_to_long(df,stubnames='Retailer',i=['Global_Code','Info'],j='Retailer_Name',sep='_',suffix='\\w+').reset_index()
Out[155]: 
  Global_Code  Info Retailer_Name Retailer
0         'A'     1        X_Code      'a'
1         'A'     1        Y_Code    'a_a'
2         'B'     2        X_Code      'b'
3         'B'     2        Y_Code    'b_b'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM