简体   繁体   中英

Pandas reindexing data frame issue

Say I have the following data frame,

         A       B
0  1986-87  232131
1  1987-88  564564
2  1988-89  123125
               ...

And so on.

I'm trying to reindex, with <myFrame>.set_index('A') , so that I get

                B
  1986-87  232131
  1987-88  564564
  1988-89  123125

but I keep getting this instead:

               B
       A       
 1986-87  232131
 1987-88  564564
 1988-89  123125

and its annoying as heck cause I tried the other reindexing methods. I'm not sure what the A is actually representing because it doesn't appear in <myFrame>.columns or <myFrame>.index and doing <myFrame>['B'][0] gives me 232131 , so what is A in this reindexed data frame and how can I index correctly from the beginning or get rid of this strange A in the incorrectly reindex data frame.

You need to reset the name/names attribute of the index:

df.index.names = [None]

Example:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B']).set_index('A')

In [12]: df
Out[12]: 
   B
A   
1  2
3  4

In [13]: df.index.names = [None]

In [14]: df
Out[14]: 
   B
1  2
3  4

The names describe the index, and give some meaning to the index, and also distinguishes between different levels in the index (in a MultiIndex).

As @DSM points out, do so at your own peril, this loses info if you want to reset_index back:

In [15]: df.reset_index() # col_fill=['A', 'B'])
Out[15]: 
   index  B
0      1  2
1      3  4

However, you can col_fill in the names manually:

In [16]: df.reset_index(col_fill=['A'])
Out[16]: 
   A  B
0  1  2
1  3  4

I think your main problem is that you need to actually save the result of set_index , or use inplace=True , for the index to be set:

# Either
df.set_index('A', inplace=True)
# Or:
# df = df.set_index('A')

The output you were seeing was correct, it was a dataframe indexed by A, but you just hadn't stored it in a variable. Once you have stored it, things should work like you expect:

df.index
Out[6]: Index([u'1986-87', u'1987-88', u'1988-89'], dtype=object)

df.loc[u'1987-88']
Out[8]: 
B    564564
Name: 1987-88, dtype: int64

I have a dataframe that is generated from appending multiple dataframe together into a long list. As shown in figure, the default index is a loop between 0 ~ 7 because each original df has this index. The total row number is 240. So how can reindex the new df into 0~239 instead of 30 x 0~7.

I tried df.reset_index(drop=True) , but it doesn't seem to work. I also tried: df.reindex(np.arange(240)) but it returned error

ValueError: cannot reindex from a duplicate axis

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM