简体   繁体   中英

Transform rows in columns in Pandas

I am struggling with transforming rows to columns in Pandas, please review input data below:

 id match bookmaker home away  
  1  T1-T2 Bet365   1.5  2.4
  1  T1-T2 Bwin     1.6  2.2
  1  T1-T2 Betfair  1.7  2.3
  2  T1-T3 Bet365   1.2  2.9
  2  T1-T3 Bwin     1.2  2.8
  2  T1-T3 Betfair  1.1  3.0

I need to transform it as new array :

 id match  Bet365_home Bet365_away Bwin_home Bwin_away Betfair_home Betfair_away  
  1  T1-T2         1.5         2.4       1.6       2.2          1.7          2.3
  2  T1-T3         1.2         2.9       1.2       2.8          1.1          3.0 

If you can suggest how it can be done in PostgreSQL, also would be cool!

I don't know the SQL method but in pandas you want to pivot :

In [233]:    
df.pivot(index='id', columns = 'bookmaker')

Out[233]:
           match                  home                away             
bookmaker Bet365 Betfair   Bwin Bet365 Betfair Bwin Bet365 Betfair Bwin
id                                                                     
1          T1-T2   T1-T2  T1-T2    1.5     1.7  1.6    2.4     2.3  2.2
2          T1-T3   T1-T3  T1-T3    1.2     1.1  1.2    2.9     3.0  2.8

To group by both the id and the match , you could use set_index . If you also add bookmaker to the index and then unstack it:

import numpy as np
import pandas as pd

df = pd.read_table('data', sep='\s+')
df = df.set_index(['id', 'match', 'bookmaker']).unstack(['bookmaker'])

you will get

            home                away             
bookmaker Bet365 Bwin Betfair Bet365 Bwin Betfair
id match                                         
1  T1-T2     1.5  1.6     1.7    2.4  2.2     2.3
2  T1-T3     1.2  1.2     1.1    2.9  2.8     3.0

The hierarchical (MultiIndex) column

  home                away             
Bet365 Bwin Betfair Bet365 Bwin Betfair

has more structure than the flat single-level column index:

Bet365_home Bet365_away Bwin_home Bwin_away Betfair_home Betfair_away

It makes selection or grouping by home or away easier than if the column index were flat. In general I think it is a better format for the DataFrame.

However, if you'd like to have a flat column index:

df = df.swaplevel(0, 1, axis=1)
df = df.reindex(columns='Bet365 Bwin Betfair'.split(), level=0)
df.columns = ['{}_{}'.format(bet, hw)  for bet, hw  in df.columns]
pd.options.display.width = 100
print(df)

yields

          Bet365_home  Bet365_away  Bwin_home  Bwin_away  Betfair_home  Betfair_away
id match                                                                            
1  T1-T2          1.5          2.4        1.6        2.2           1.7           2.3
2  T1-T3          1.2          2.9        1.2        2.8           1.1           3.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM