子集具有索引的pandas数据框，该索引包含重复项

Question

For the data frame: 对于数据框：

df = pd.DataFrame({
    'key': [1,2,3,4,5, np.nan, np.nan],
    'value': ['one','two','three', 'four', 'five', 'six', 'seven']
}).set_index('key')

That looks like this: 看起来像这样：

        value
key     
1.0     one
2.0     two
3.0     three
4.0     four
5.0     five
NaN     six
NaN     seven

I would like to subset it to: 我想将其子集为：

    value
key     
1   one
1   one
6   NaN

This produces a warning: 这会产生警告：

df.loc[[1,1,6],]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

This produces an error: 这会产生一个错误：

df.reindex([1, 1, 6])

ValueError: cannot reindex from a duplicate axis

How to do it while referencing a missing index and without using apply? 如何在引用缺少的索引而不使用Apply的情况下执行此操作？

Answer 1

The thing is you have duplicated values NaN s as indexes. 问题是您有重复的值NaN作为索引。 You should disconsider those when reindexing because they are duplicates and there is ambiguity on which value use in the new index. 您应该在重新编制索引时不要考虑那些索引，因为它们是重复的，并且在新索引中使用哪个值有歧义。

df.loc[df.index.dropna()].reindex([1, 1, 6])

    value
key 
1   one
1   one
6   NaN

For a generalized solution, use duplicated 对于通用解决方案，请使用duplicated

df.loc[~df.index.duplicated(keep=False)].reindex([1, 1, 6])

If you want to keep duplicated indexes and use reindex , you'll fail. 如果要保留重复的索引并使用reindex ，则会失败。 This has actually been asked before a couple of times 实际上已经被问过几次了

子集具有索引的pandas数据框，该索引包含重复项

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-08-23 00:03:59

子集具有索引的pandas数据框，该索引包含重复项

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-08-23 00:03:59

解决方案1
0 已采纳 2018-08-23 00:03:59