从MultiIndex删除级别

Question

I need to remove a level (either by position or name) from a DataFrame's index and create a new DataFrame with the new index. 我需要从DataFrame的索引中删除一个级别（按位置或名称），并使用新索引创建一个新的DataFrame。 The problem is that I end up having a non-unique index. 问题是我最终有一个非唯一索引。

I had a look at Remove a level from a pandas MultiIndex but the problem is that the use of unique(), as the answer in there suggests, reduces the index to an array, that doesn't retain the names of the levels. 我看过从pandas MultiIndex中删除一个级别，但是问题是，正如那里的答案所示，unique（）的使用将索引减少为一个数组，但不保留级别的名称。

Other than using unique(), and then creating a new Index by stitching the label names onto the array, is there a more elegant solution? 除了使用unique（），然后通过将标签名称拼接到数组上来创建新的Index之外，还有更优雅的解决方案吗？

index = [np.array(['foo', 'foo', 'qux']), np.array(['a', 'b', 'a'])]
data = np.random.randn(3, 2)
columns = ["X", "Y"]
df = pd.DataFrame(data, index=index, columns=columns)
df.index.names = ["Level0", "Level1"]
print df

                      X         Y
Level0 Level1                    
foo    a      -0.591649  0.831599
       b       0.049961 -1.524291
qux    a      -0.100124 -1.059195

index2 = df.reset_index(level=1, drop=True).index
df2 = pd.DataFrame(index=index2)
print df2.loc[idx['foo'], :]

Empty DataFrame
Columns: []
Index: [foo, foo]

Answer 1

If I understand you correctly, you are looking for a solution to get the first level index without duplicated values. 如果我对您的理解正确，那么您正在寻找一种解决方案，以获取没有重复值的第一级索引。 Your result should be an Ìndex object without using unique and without explicitly creating the index again. 您的结果应该是一个Ìndex对象，不要使用unique ，也不要再次显式创建索引。

For your example data frame, you can use the following including get_level_values and drop_duplicates : 对于示例数据框，可以使用以下内容，包括get_level_values和drop_duplicates ：

print(df.index.get_level_values(0).drop_duplicates())
Index(['foo', 'qux'], dtype='object', name='Level0')

Edit 编辑

For a more general solution either returning an Index or MultiIndex depending on the number of levels, you may use droplevel and drop_duplicates in conjunction: 对于更一般的解决方案，根据级别数返回Index或MultiIndex ，可以结合使用droplevel和drop_duplicates ：

print(df.index.droplevel(-1).drop_duplicates())
Index(['foo', 'qux'], dtype='object', name='Level0')

Here is the example from the linked SO post with 3 levels which are reduced to 2 levels mutltiindex with unique values: 这是来自链接的SO帖子的示例，该帖子具有3个级别，并被降低为具有唯一值的2个级别mutltiindex：

tuples = [(0, 100, 1000),(0, 100, 1001),(0, 100, 1002), (1, 101, 1001)]
index_3levels=pd.MultiIndex.from_tuples(tuples,names=["l1","l2","l3"])
print(index_3levels)

MultiIndex(levels=[[0, 1], [100, 101], [1000, 1001, 1002]],
           labels=[[0, 0, 0, 1], [0, 0, 0, 1], [0, 1, 2, 1]],
           names=['l1', 'l2', 'l3'])


index2level= index_3levels.droplevel(-1).drop_duplicates()
print(index2level)

MultiIndex(levels=[[0, 1], [100, 101]],
           labels=[[0, 1], [0, 1]],
           names=['l1', 'l2'])

# show unique values of new index
print(index2level)
[(0, 100) (1, 101)]

从MultiIndex删除级别

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-03-06 11:50:36

Edit 编辑

从MultiIndex删除级别

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-03-06 11:50:36

Edit 编辑

解决方案1
1 已采纳 2017-03-06 11:50:36