简体   繁体   English

从MultiIndex删除级别

[英]Remove a level from a MultiIndex

I need to remove a level (either by position or name) from a DataFrame's index and create a new DataFrame with the new index. 我需要从DataFrame的索引中删除一个级别(按位置或名称),并使用新索引创建一个新的DataFrame。 The problem is that I end up having a non-unique index. 问题是我最终有一个非唯一索引。

I had a look at Remove a level from a pandas MultiIndex but the problem is that the use of unique(), as the answer in there suggests, reduces the index to an array, that doesn't retain the names of the levels. 我看过从pandas MultiIndex删除一个级别,但是问题是,正如那里的答案所示,unique()的使用将索引减少为一个数组,但不保留级别的名称。

Other than using unique(), and then creating a new Index by stitching the label names onto the array, is there a more elegant solution? 除了使用unique(),然后通过将标签名称拼接到数组上来创建新的Index之外,还有更优雅的解决方案吗?

index = [np.array(['foo', 'foo', 'qux']), np.array(['a', 'b', 'a'])]
data = np.random.randn(3, 2)
columns = ["X", "Y"]
df = pd.DataFrame(data, index=index, columns=columns)
df.index.names = ["Level0", "Level1"]
print df

                      X         Y
Level0 Level1                    
foo    a      -0.591649  0.831599
       b       0.049961 -1.524291
qux    a      -0.100124 -1.059195

index2 = df.reset_index(level=1, drop=True).index
df2 = pd.DataFrame(index=index2)
print df2.loc[idx['foo'], :]

Empty DataFrame
Columns: []
Index: [foo, foo]

If I understand you correctly, you are looking for a solution to get the first level index without duplicated values. 如果我对您的理解正确,那么您正在寻找一种解决方案,以获取没有重复值的第一级索引。 Your result should be an Ìndex object without using unique and without explicitly creating the index again. 您的结果应该是一个Ìndex对象,不要使用unique ,也不要再次显式创建索引。

For your example data frame, you can use the following including get_level_values and drop_duplicates : 对于示例数据框,可以使用以下内容,包括get_level_valuesdrop_duplicates

print(df.index.get_level_values(0).drop_duplicates())
Index(['foo', 'qux'], dtype='object', name='Level0')

Edit 编辑

For a more general solution either returning an Index or MultiIndex depending on the number of levels, you may use droplevel and drop_duplicates in conjunction: 对于更一般的解决方案,根据级别数返回IndexMultiIndex ,可以结合使用dropleveldrop_duplicates

print(df.index.droplevel(-1).drop_duplicates())
Index(['foo', 'qux'], dtype='object', name='Level0')

Here is the example from the linked SO post with 3 levels which are reduced to 2 levels mutltiindex with unique values: 这是来自链接的SO帖子的示例,该帖子具有3个级别,并被降低为具有唯一值的2个级别mutltiindex:

tuples = [(0, 100, 1000),(0, 100, 1001),(0, 100, 1002), (1, 101, 1001)]
index_3levels=pd.MultiIndex.from_tuples(tuples,names=["l1","l2","l3"])
print(index_3levels)

MultiIndex(levels=[[0, 1], [100, 101], [1000, 1001, 1002]],
           labels=[[0, 0, 0, 1], [0, 0, 0, 1], [0, 1, 2, 1]],
           names=['l1', 'l2', 'l3'])


index2level= index_3levels.droplevel(-1).drop_duplicates()
print(index2level)

MultiIndex(levels=[[0, 1], [100, 101]],
           labels=[[0, 1], [0, 1]],
           names=['l1', 'l2'])

# show unique values of new index
print(index2level)
[(0, 100) (1, 101)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM