获取Pandas DataFrame每个元素的最新信息，以及范围索引和日期列？

Question

I have a sample DataFrame as such: 我有一个这样的示例DataFrame：

df = pd.DataFrame(data=[('foo', datetime.date(2014, 10, 1)), 
                        ('foo', datetime.date(2014, 10, 2)), 
                        ('bar', datetime.date(2014, 10, 3)), 
                        ('bar', datetime.date(2014, 10, 1))], 
                  columns=('name', 'date'))

which looks like this: 看起来像这样：

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

I want to restrict the dataframe to just the last incident of each element in the name column, how do I do this? 我想将数据框限制为仅在名称列中每个元素的最后一个事件，我该怎么做？

I could awkwardly (at least I think it would be awkward) construct a boolean Series object to do this and pass it to the DataFrame's __getitem__ , like this: 我可能很尴尬（至少我认为这很尴尬）构造一个布尔Series对象来做到这一点，并将其传递给DataFrame的__getitem__ ，如下所示：

pd[latest_name]

How do I most elegantly get the latest entry for each name element? 如何最优雅地获取每个name元素的最新条目？

Answer 1

A coworker just had a very similar question to this. 一位同事对此有一个非常相似的问题。

With a DataFrame object like this: 使用这样的DataFrame对象：

  name        date
0  foo  2014-10-01
1  foo  2014-10-02
2  bar  2014-10-03
3  bar  2014-10-01

You can sort by the date and then drop the duplicates, keeping the last ones like this: 您可以按日期排序，然后删除重复项，最后保留这样的内容：

last = df.sort(columns=('date',)).drop_duplicates(cols=('name',), take_last=True)
# note cols is deprecated in more recent versions of pandas,
# and you should use subset='name' if available to you

and last is now: last是：

  name        date
1  foo  2014-10-02
2  bar  2014-10-03

[2 rows x 2 columns]

But it may be preferable to set the date as the index, if we can drop the old indexes, and then just sort by the index: 但是，如果我们可以删除旧索引，然后按索引排序，则最好将日期设置为索引：

df = df.set_index('date')
df = df.sort_index() # inplace=True is deprecated, so must assign

df now returns: df现在返回：

           name
date           
2014-10-01  foo
2014-10-01  bar
2014-10-02  foo
2014-10-03  bar

Now to just take the last elements: 现在只考虑最后一个元素：

last_elements_frame = df.drop_duplicates(take_last=True)

and last_elements_frame is now: 现在last_elements_frame是：

           name
date           
2014-10-02  foo
2014-10-03  bar

获取Pandas DataFrame每个元素的最新信息，以及范围索引和日期列？

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-10-01 21:45:02

获取Pandas DataFrame每个元素的最新信息，以及范围索引和日期列？

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-10-01 21:45:02

解决方案1
2 已采纳 2014-10-01 21:45:02