简体   繁体   中英

How to access a column within a column using pandas

All,

I have a dataframe that looks like this: df[['date','PRICE']]

df>>

date                   Price
                 PX_FIRST     PX_LAST

2018-03-05        1.710       -0.511
2018-03-06        1.725       -0.513
2018-03-07        1.745       -0.511
2018-03-08        1.750       -0.512

how can I get a dataframe similar to this? in other words how can I access PX_FIRST and PX_LAST. When I do df[['date','PRICE']] it dont manage to access individual columns.

  date           PX_FIRST     PX_LAST

2018-03-05        1.710       -0.511
2018-03-06        1.725       -0.513
2018-03-07        1.745       -0.511
2018-03-08        1.750       -0.512

If need select columns under Price value of first level:

df = df['Price']

Or use DataFrame.xs :

df = df.xs('Price', axis=1)
print (df)
            PX_FIRST  PX_LAST
Date                         
2018-03-05     1.710   -0.511
2018-03-06     1.725   -0.513
2018-03-07     1.745   -0.511
2018-03-08     1.750   -0.512

If need remove top level of MultiIndex :

df.columns = df.columns.droplevel(0)

But be carefull if more columns with different first level ( Price , Price1 ) and same values in second level:

#create sample data
df = pd.concat([df['Price'], df['Price'] * 0.4], keys=('Price','Price1'), axis=1)
print (df)
              Price           Price1        
           PX_FIRST PX_LAST PX_FIRST PX_LAST
Date                                        
2018-03-05    1.710  -0.511    0.684 -0.2044
2018-03-06    1.725  -0.513    0.690 -0.2052
2018-03-07    1.745  -0.511    0.698 -0.2044
2018-03-08    1.750  -0.512    0.700 -0.2048

Remove first level:

df.columns = df.columns.droplevel(0)
print (df)
            PX_FIRST  PX_LAST  PX_FIRST  PX_LAST
Date                                            
2018-03-05     1.710   -0.511     0.684  -0.2044
2018-03-06     1.725   -0.513     0.690  -0.2052
2018-03-07     1.745   -0.511     0.698  -0.2044
2018-03-08     1.750   -0.512     0.700  -0.2048

If select column PX_FIRST it return DataFrame , because duplicated columns names:

print (df['PX_FIRST'])
            PX_FIRST  PX_FIRST
Date                          
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

If need select by both levels, use tuples:

print (df[('Price', 'PX_FIRST')])
Date
2018-03-05    1.710
2018-03-06    1.725
2018-03-07    1.745
2018-03-08    1.750
Name: (Price, PX_FIRST), dtype: float64

IIUC multiple index

df.loc[:,pd.IndexSlice['Price']]
Out[1108]: 
            PX_FIRST  PX_LAST
Date                         
2018-03-05     1.710   -0.511
2018-03-06     1.725   -0.513
2018-03-07     1.745   -0.511
2018-03-08     1.750   -0.512

@jezrael You are exactly right when I drop one level I end up with a duplicate column name and it is hard to distinguish columns unless I rename them?

The other challenge in your example below

            PX_FIRST  PX_FIRST
Date                          
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

is that column "Date", "PX_FIRST" and "PX_FIRST" are in different levels so I call df[['Date','PX_FIRST','PX_FIRST']] i get an error "...not in index"

Ideally, id be looking to get

 Date          PX_FIRST  PX_LAST                              
2018-03-05     1.710     0.684
2018-03-06     1.725     0.690
2018-03-07     1.745     0.698
2018-03-08     1.750     0.700

All column names are on a similar level and have different names

Thanks

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM