子集具有索引的pandas數據框，該索引包含重復項

Question

對於數據框：

df = pd.DataFrame({
    'key': [1,2,3,4,5, np.nan, np.nan],
    'value': ['one','two','three', 'four', 'five', 'six', 'seven']
}).set_index('key')

看起來像這樣：

        value
key     
1.0     one
2.0     two
3.0     three
4.0     four
5.0     five
NaN     six
NaN     seven

我想將其子集為：

    value
key     
1   one
1   one
6   NaN

這會產生警告：

df.loc[[1,1,6],]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

這會產生一個錯誤：

df.reindex([1, 1, 6])

ValueError: cannot reindex from a duplicate axis

如何在引用缺少的索引而不使用Apply的情況下執行此操作？

Answer 1

問題是您有重復的值NaN作為索引。 您應該在重新編制索引時不要考慮那些索引，因為它們是重復的，並且在新索引中使用哪個值有歧義。

df.loc[df.index.dropna()].reindex([1, 1, 6])

    value
key 
1   one
1   one
6   NaN

對於通用解決方案，請使用duplicated

df.loc[~df.index.duplicated(keep=False)].reindex([1, 1, 6])

如果要保留重復的索引並使用reindex ，則會失敗。 實際上已經被問過幾次了

子集具有索引的pandas數據框，該索引包含重復項

問題描述

1 個解決方案

解決方案1
0 已采納 2018-08-23 00:03:59

子集具有索引的pandas數據框，該索引包含重復項

問題描述

1 個解決方案

解決方案1 0 已采納 2018-08-23 00:03:59

解決方案1
0 已采納 2018-08-23 00:03:59