[英]Subset a pandas dataframe that has an index that contains duplicates
For the data frame: 对于数据框:
df = pd.DataFrame({
'key': [1,2,3,4,5, np.nan, np.nan],
'value': ['one','two','three', 'four', 'five', 'six', 'seven']
}).set_index('key')
That looks like this: 看起来像这样:
value
key
1.0 one
2.0 two
3.0 three
4.0 four
5.0 five
NaN six
NaN seven
I would like to subset it to: 我想将其子集为:
value
key
1 one
1 one
6 NaN
This produces a warning: 这会产生警告:
df.loc[[1,1,6],]
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
This produces an error: 这会产生一个错误:
df.reindex([1, 1, 6])
ValueError: cannot reindex from a duplicate axis
How to do it while referencing a missing index and without using apply? 如何在引用缺少的索引而不使用Apply的情况下执行此操作?
The thing is you have duplicated values NaN
s as indexes. 问题是您有重复的值
NaN
作为索引。 You should disconsider those when reindexing because they are duplicates and there is ambiguity on which value use in the new index. 您应该在重新编制索引时不要考虑那些索引,因为它们是重复的,并且在新索引中使用哪个值有歧义。
df.loc[df.index.dropna()].reindex([1, 1, 6])
value
key
1 one
1 one
6 NaN
For a generalized solution, use duplicated
对于通用解决方案,请使用
duplicated
df.loc[~df.index.duplicated(keep=False)].reindex([1, 1, 6])
If you want to keep duplicated indexes and use reindex
, you'll fail. 如果要保留重复的索引并使用
reindex
,则会失败。 This has actually been asked before a couple of times 实际上已经被问过几次了
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.