将多索引列追加到DataFrame的索引

Question

I have generated a initial dataframe called df and then an adjusted dataframe called df_new. 我生成了一个称为df的初始数据帧，然后生成了一个名为df_new的调整后的数据帧。

I wish to get from df to df_new using a set_index() operation. 我希望使用set_index（）操作从df转换为df_new。 My problem is how to negotiate the hierarchical index on columns 我的问题是如何协商列上的层次结构索引

import pandas as pd
import numpy as np

df = pd.DataFrame(np.ones((5,5)))
col_idx = pd.MultiIndex.from_tuples([('X','a'),('X','b'),('Y','c'),('Y','d'),('Y','e')])
row_idx = ['a1','a2','a3','a4','a5']
df.columns = col_idx
df.index = row_idx
idx = pd.IndexSlice
df.loc[:,idx['Y','d']] = 99
print df.head()


    X     Y       
    a  b  c   d  e
a1  1  1  1  99  1
a2  1  1  1  99  1
a3  1  1  1  99  1
a4  1  1  1  99  1
a5  1  1  1  99  1

#------------------------------------------------------------------------------------------


df_new = pd.DataFrame(np.ones((5,4)))
col_idx = pd.MultiIndex.from_tuples([('X','a'),('X','b'),('Y','c'),('Y','e')])
row_idx = pd.MultiIndex.from_tuples([('a1',99),('a2',99),('a3',99),('a4',99),('a5',99)])

df_new.columns = col_idx
df_new.index = row_idx
print df_new.head()

# this is what df_new should look like.
# ('Y','d') got appended to the row index.

       X     Y   
       a  b  c  e
a1 99  1  1  1  1
a2 99  1  1  1  1
a3 99  1  1  1  1
a4 99  1  1  1  1
a5 99  1  1  1  1

Answer 1

You can use a tuple notation to indicate a column of the multi-indexed columns (and you need append=True to not replace the existing index): 您可以使用元组符号来表示多索引列中的一列（并且需要append=True才能不替换现有索引）：

In [34]: df.set_index(('Y', 'd'), append=True)
Out[34]:
           X     Y
           a  b  c  e
   (Y, d)
a1 99      1  1  1  1
a2 99      1  1  1  1
a3 99      1  1  1  1
a4 99      1  1  1  1
a5 99      1  1  1  1

If you want to remove the index name, you can do: 如果要删除索引名称，可以执行以下操作：

In [42]: df2 = df.set_index(('Y', 'd'), append=True)

In [43]: df2.index.names = [None, None]

In [44]: df2
Out[44]:
       X     Y
       a  b  c  e
a1 99  1  1  1  1
a2 99  1  1  1  1
a3 99  1  1  1  1
a4 99  1  1  1  1
a5 99  1  1  1  1

When you want to add multiple columns to the index, you have to use a list of columns names (in this case tuples): 要向索引添加多个列时，必须使用列名称列表（在本例中为元组）：

df.set_index([('Y', 'd'), ('Y', 'e')], append=True)

Answer 2

The DataFrame.set_index method takes an append keyword argument, so you can simply do like this: DataFrame.set_index方法采用append关键字参数，因此您可以像这样简单地进行操作：

df_new = df.set_index(("Y", "d"), append=True)

If you want to add multiple columns, just provide them as a list: 如果要添加多个列，只需将它们作为列表提供：

df_new = df.set_index([("Y", "d"), ("Y", "e")], append=True)

将多索引列追加到DataFrame的索引

问题描述

2 个解决方案

解决方案1
1 2015-07-08 12:02:30

解决方案2
1 已采纳 2015-07-08 12:09:42

将多索引列追加到DataFrame的索引

问题描述

2 个解决方案

解决方案1 1 2015-07-08 12:02:30

解决方案2 1 已采纳 2015-07-08 12:09:42

解决方案1
1 2015-07-08 12:02:30

解决方案2
1 已采纳 2015-07-08 12:09:42