简体   繁体   English

pandas DataFrame从DateTimeIndex - KeyError中选择行列表。 了解原因

[英]pandas DataFrame selecting list of rows from DateTimeIndex - KeyError. Understanding why

I'm trying to understand why I get this error. 我想知道为什么我会收到这个错误。 I already have a solution for this issue and it was actually solved here , just need to understand why it doesn't work as I was expecting. 我已经有了解决这个问题的方法,它实际上已经解决 ,只需要理解为什么它不能像我期望的那样工作。

I would like to understand why this throws a KeyError: 我想了解为什么会引发KeyError:

dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.identity(4), index=dates, columns=list('ABCD'))
df.loc[['20130102', '20130103'],:]

with the following feedback: 以下反馈:

KeyError: "None of [['20130102', '20130103']] are in the [index]"

As explained here , the solution is just to do: 正如解释在这里 ,解决方案就是要做到:

df.loc[pd.to_datetime(['20130102','20130104']),:]

So the problem is definitely with the way loc takes the string list as argument for selecting from a DateTimeIndex. 所以问题肯定在于loc将字符串列表作为从DateTimeIndex中选择的参数。 However, I can see that the following calls are ok for this function: 但是,我可以看到以下调用对于此函数是可以的:

df.loc['20130102':'20130104',:]

and

df.loc['20130102']

I would like to understand how this works and would appreciate any resources I can use to predict the behavior of this function depending of how it is being called. 我想了解它是如何工作的,并希望我可以使用任何资源来预测此函数的行为,具体取决于它的调用方式。 I read Indexing and Selecting Data and Time Series/Date functionality from pandas documentation but couldn't find an explanation for this. 我从pandas文档中读取了索引和选择数据时间序列/日期功能 ,但无法找到解释。

Typically, when you pass an array like object to loc , Pandas is going to try to locate each element of that array in the index. 通常,当您将类似对象的数组传递给loc ,Pandas将尝试在索引中找到该数组的每个元素。 If it doesn't find it, you'll get a KeyError . 如果找不到,你会得到一个KeyError And! 和! you passed an array of strings when the values in the index are Timestamp s... so those strings definitely aren't in the index. 当索引中的值是Timestamp s时,你传递了一个字符串数组...所以这些字符串肯定不在索引中。

However, Pandas also tries to make things easier for you. 然而,熊猫也试图让事情变得更容易。 In particular, with a DatetimeIndex , If you were to pass a string scalar 特别是,使用DatetimeIndex ,如果要传递字符串标量

df.loc['20130102']

A    0.0
B    1.0
C    0.0
D    0.0
Name: 2013-01-02 00:00:00, dtype: float64

Pandas will attempt to parse that scalar as a Timestamp and see if that value is in the index. Pandas将尝试将该标量解析为Timestamp并查看该值是否在索引中。

If you were to pass a slice object 如果你要传递slice对象

df.loc['20130102':'20130104']

              A    B    C    D
2013-01-02  0.0  1.0  0.0  0.0
2013-01-03  0.0  0.0  1.0  0.0
2013-01-04  0.0  0.0  0.0  1.0

Pandas will also attempt to parse the bits of the slice object as Timestamp and return an appropriately sliced dataframe. Pandas还将尝试将切片对象的位解析为Timestamp并返回适当切片的数据帧。

Your KeyError is simply passed the limits of how much helpfulness the Pandas Devs had time to code. 您的KeyError只是通过了Pandas Devs有时间编写代码的有用程度的限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM