简体   繁体   English

熊猫重新索引并填补缺失值:“索引必须是单调的”

[英]Pandas reindex and fill missing values: “Index must be monotonic”

In answering this stackoverflow question , I found some interesting behavior when using a fill method while reindexing a dataframe. 在回答这个stackoverflow问题时 ,我在重建索引数据帧时使用填充方法时发现了一些有趣的行为。

This old bug report in pandas says that df.reindex(newIndex,method='ffill') should be equivalent to df.reindex(newIndex).ffill() , but that is NOT the behavior I'm witnessing pandas中的这个旧bug报告df.reindex(newIndex,method='ffill')应该等同于df.reindex(newIndex).ffill() ,但这不是我见证的行为

Here's a code snippet that illustrates the behavior 这是一个说明行为的代码片段

df = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-05-04', '2016-06-03']))
newIndex = pd.DatetimeIndex(['2016-05-04', '2016-06-01', '2016-06-02', '2016-06-03', '2016-06-05'])
print(df.reindex(newIndex).ffill())
print(df.reindex(newIndex, method='ffill'))

The first print statement works as expected. 第一个print语句按预期工作。 The second raises a 第二个提出了一个

ValueError: index must be monotonic increasing or decreasing

What's going on here? 这里发生了什么?


EDIT: Note that the sample df intentionally has a non-monotonic index. 编辑:请注意,样本df 故意具有非单调索引。 The question pertains to the order of operations in df.reindex(newIndex, method='ffil') . 问题与df.reindex(newIndex, method='ffil')的操作顺序有关。 My expectation is as the bug-report says it should work- first reindex with the new index and then fill. 我的期望是因为错误报告说它应该工作 - 首先用新索引重新索引然后填充。

As you can see, the newIndex.is_monotonic is True , and the fill works when called separately but fails when called as a parameter to reindex . 如您所见, newIndex.is_monotonicTrue ,填充在单独调用时有效,但在作为重新reindex的参数调用时失败。

Some element of reindex requires the incoming index to be sorted. reindex某些元素需要对传入的索引进行排序。 I'm deducing that when method is passed, it fails to presort the incoming index and subsequently fails. 我推断,当method通过时,它无法预先输入索引并随后失败。 I'm drawing this conclusion based on the fact that this works: 我根据以下事实得出这个结论:

print df.sort_index().reindex(newIndex.sort_values(), method='ffill')

It seems that this needs to be done on the columns as well. 似乎这也需要在列上完成。

In[76]: frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'],columns=['Ohio', 'Texas', 'California'])

In[77]: frame.reindex(index=['a','b','c','d'],method='ffill',columns=states)
---> ValueError: index must be monotonic increasing or decreasing

In[78]: frame.reindex(index=['a','b','c','d'],method='ffill',columns=states.sort())

Out[78]:
  Ohio  Texas  California
a     0      1           2
b     0      1           2
c     3      4           5
d     6      7           8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM