通过日期时间索引过滤pandas DataFrame时结果不同

Question

I'm trying to filter a pandas DataFrame and I'm getting different results using a test case and the real data. 我正在尝试过滤pandas DataFrame，并且使用测试用例和真实数据得到了不同的结果。 Using real data I'm getting NaN values, while on the test case I'm getting what I expect. 使用实际数据，我得到的是NaN值，而在测试用例中，我得到的是我所期望的。

Test case: 测试用例：

The test case I created has following code: 我创建的测试用例具有以下代码：

import pandas as pd
df1 = pd.DataFrame([
["2014-08-06 12:10:00", 19.85,  299.96, 17.5,   228.5,  19.63,  571.43],
["2014-08-06 12:20:00", 19.85,  311.55, 17.85,  248.68, 19.78,  547.21],
["2014-08-06 12:30:00", 20.06,  355.27, 18.35,  224.82, 19.99,  410.68],
["2014-08-06 12:40:00", 20.14,  405.95, 18.49,  247.33, 20.5,   552.79],
["2014-08-06 12:50:00", 20.14,  352.87, 18.7,   449.33, 20.86,  616.44],
["2014-08-06 13:00:00", 20.28,  356.96, 18.92,  307.57, 21.15,  471.18]],
columns=["date_time","t1", "1", "t4", "4", "t6", "6"])
df1 = df1.set_index(["date_time"])
df1 = pd.to_datetime(df1)

filter1 = pd.DataFrame(["2014-08-06 12:20:00","2014-08-06 13:00:00"])
df1_filtered = df1.ix[filter1[filter1.columns[0]][0:2]]

As you may expect, the result is: 如您所料，结果是：

>>> df1_filtered
                        t1       1     t4       4     t6       6
2014-08-06 12:20:00  19.85  311.55  17.85  248.68  19.78  547.21
2014-08-06 13:00:00  20.28  356.96  18.92  307.57  21.15  471.18

Using real data: 使用真实数据：

Real data comes from a txt file and looks like this: 实际数据来自txt文件，如下所示：

Fecha_hora  t1  1   t4  4   t6  6
2014-08-06 12:10:00 19.85   299.96  17.5    228.5   19.63   571.43
2014-08-06 12:20:00 19.85   311.55  17.85   248.68  19.78   547.21
2014-08-06 12:30:00 20.06   355.27  18.35   224.82  19.99   410.68
2014-08-06 12:40:00 20.14   405.95  18.49   247.33  20.5    552.79
2014-08-06 12:50:00 20.14   352.87  18.7    449.33  20.86   616.44
2014-08-06 13:00:00 20.28   356.96  18.92   307.57  21.15   471.18

However when I read the real data, and use same filter as before this way: 但是，当我读取实际数据并使用与以前相同的过滤器时：

df2 = pd.read_csv(r"D:/tmp/data.txt", sep='\t', parse_dates=True, index_col=0)
df2_filtered = df2.ix[filter1[filter1.columns[0]][0:2]]

I get following results with values as NaN : 我得到以下结果，其值为NaN ：

>>> df2_filtered
                     t1   1  t4   4  t6   6
2014-08-06 12:20:00 NaN NaN NaN NaN NaN NaN
2014-08-06 13:00:00 NaN NaN NaN NaN NaN NaN

But I can still get the values from a certain row like this: 但是我仍然可以像这样从某个行中获取值：

>>> df2.ix["2014-08-06 12:20:00"]
t1     19.85
1     311.55
t4     17.85
4     248.68
t6     19.78
6     547.21
Name: 2014-08-06 12:20:00

Question: 题：

How can I filter my real data in order to get same results as in my test case? 如何过滤真实数据以获得与测试用例相同的结果？ May there be a better way to achieve what I'm looking for? 可能会有更好的方法来实现我的期望吗？

Note : My pandas version is 0.9.0 used under python 2.5 . 注意：我的pandas版本是在python 2.5下使用的0.9.0 。 Means I have no loc function. 表示我没有loc函数。

Note 2 : I even tried this using python 2.7 under pythonanywhere.com with same different results. 注意2 ：我什至在pythonanywhere.com上使用python 2.7 尝试了同样的结果。 However if I check for df1==df2 I get True for every single value. 但是，如果我检查df1==df2 ，则每个单个值都为True 。

Answer 1

Hopefully goes without saying, but if at all possible, upgrade your python/pandas! 希望不用多说，但是如果可能的话，请升级您的python / pandas！

In this case, on a recent version ( 0.20.3 ) I get missing values in both cases - I need to convert the lookup keys to datetimes and I'm guessing it will work for you too. 在这种情况下，在最近的版本（ 0.20.3 ）中，两种情况下我都缺少值-我需要将查找键转换为日期时间，我想它也将对您有用。

The convenience string based date indexing only works with scalars / slices. 基于便利字符串的日期索引仅适用于标量/切片。

In [174]: lookup = pd.to_datetime(filter1[filter1.columns[0]][0:2])

In [175]: df2.ix[lookup]
Out[175]: 
                        t1       1     t4       4     t6       6
Fecha_hora                                                      
2014-08-06 12:20:00  19.85  311.55  17.85  248.68  19.78  547.21
2014-08-06 13:00:00  20.28  356.96  18.92  307.57  21.15  471.18

通过日期时间索引过滤pandas DataFrame时结果不同

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-08-30 20:48:16

通过日期时间索引过滤pandas DataFrame时结果不同

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-08-30 20:48:16

解决方案1
1 已采纳 2017-08-30 20:48:16