![](/img/trans.png)
[英]Efficient/Pythonic way to Filter pandas DataFrame based on priority
[英]Pythonic way to filter a DataFrame based on dates example
In [90]: list_dates = [datetime.date(2014,2,2),datetime.date(2015,2,2), datetime.date(2013,4,5)]
In [91]: df = DataFrame(list_dates, columns=['Date'])
In [92]: df
Out[92]:
Date
0 2014-02-02
1 2015-02-02
2 2013-04-05
现在,我想获得一个仅包含2014年和2013年日期的新DataFrame:
In [93]: result = DataFrame([date for date in df.Date if date.year in (2014,2013)])
In [94]: result
Out[94]:
0
0 2014-02-02
1 2013-04-05
那行得通,给了我想要的DataFrame。 为什么不起作用:
In [95]: result1 = df[df.Date.map(lambda x: x.year) == 2014 or p.Date.map(lambda x: x.year) == 2013]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-95-86f01906c89b> in <module>()
----> 1 result1 = df[df.Date.map(lambda x: x.year) == 2014 or p.Date.map(lambda x: x.year) == 2013]
/home/marcos/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
690 raise ValueError("The truth value of a {0} is ambiguous. "
691 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 692 .format(self.__class__.__name__))
693
694 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
或以下内容:
In [96]: df['year'] = df.Date.map(lambda x: x.year)
In [97]: result2 = df[df.year in (2014, 2013)]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-97-814358a4edff> in <module>()
----> 1 result2 = df[df.year in (2014, 2013)]
/home/marcos/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
690 raise ValueError("The truth value of a {0} is ambiguous. "
691 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 692 .format(self.__class__.__name__))
693
694 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我认为问题是当我使用'in'命令时,我试图检查整个Series是否在一个元组中。 但是,如何使评估成为元素分类,以便获得所需的结果?
我使用to_datetime
将日期转换为datetime对象,然后允许您使用dt
访问器访问year
属性,然后我们可以调用isin
并传递感兴趣的年份列表以过滤df:
In [68]:
df['Date'] = pd.to_datetime(df['Date'])
In [69]:
df[df['Date'].dt.year.isin([2013,2014])]
Out[69]:
Date
0 2014-02-02
2 2013-04-05
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.