[英]Get the latest of each element of a Pandas DataFrame, with range indexing and a date column?
I have a sample DataFrame as such: 我有一个这样的示例DataFrame:
df = pd.DataFrame(data=[('foo', datetime.date(2014, 10, 1)),
('foo', datetime.date(2014, 10, 2)),
('bar', datetime.date(2014, 10, 3)),
('bar', datetime.date(2014, 10, 1))],
columns=('name', 'date'))
which looks like this: 看起来像这样:
name date
0 foo 2014-10-01
1 foo 2014-10-02
2 bar 2014-10-03
3 bar 2014-10-01
I want to restrict the dataframe to just the last incident of each element in the name column, how do I do this? 我想将数据框限制为仅在名称列中每个元素的最后一个事件,我该怎么做?
I could awkwardly (at least I think it would be awkward) construct a boolean Series object to do this and pass it to the DataFrame's __getitem__
, like this: 我可能很尴尬(至少我认为这很尴尬)构造一个布尔Series对象来做到这一点,并将其传递给DataFrame的
__getitem__
,如下所示:
pd[latest_name]
How do I most elegantly get the latest entry for each name
element? 如何最优雅地获取每个
name
元素的最新条目?
A coworker just had a very similar question to this. 一位同事对此有一个非常相似的问题。
With a DataFrame object like this: 使用这样的DataFrame对象:
name date
0 foo 2014-10-01
1 foo 2014-10-02
2 bar 2014-10-03
3 bar 2014-10-01
You can sort by the date and then drop the duplicates, keeping the last ones like this: 您可以按日期排序,然后删除重复项,最后保留这样的内容:
last = df.sort(columns=('date',)).drop_duplicates(cols=('name',), take_last=True)
# note cols is deprecated in more recent versions of pandas,
# and you should use subset='name' if available to you
and last
is now: last
是:
name date
1 foo 2014-10-02
2 bar 2014-10-03
[2 rows x 2 columns]
But it may be preferable to set the date as the index, if we can drop the old indexes, and then just sort by the index: 但是,如果我们可以删除旧索引,然后按索引排序,则最好将日期设置为索引:
df = df.set_index('date')
df = df.sort_index() # inplace=True is deprecated, so must assign
df
now returns: df
现在返回:
name
date
2014-10-01 foo
2014-10-01 bar
2014-10-02 foo
2014-10-03 bar
Now to just take the last elements: 现在只考虑最后一个元素:
last_elements_frame = df.drop_duplicates(take_last=True)
and last_elements_frame
is now: 现在
last_elements_frame
是:
name
date
2014-10-02 foo
2014-10-03 bar
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.