简体   繁体   English

选择多索引数据帧的最新示例

[英]Select most recent example of multiindex dataframe

I have a similiar problem as in Getting the last element of a level in a multiindex .我有一个类似的问题,如在多索引中获取级别的最后一个元素 In the mentioned question the multiindex dataframe has for each group a start number which is always the same.在提到的问题中,多索引数据帧对于每个组都有一个始终相同的起始编号。

However, my problem is slightly different.但是,我的问题略有不同。 I have again two columns.我再次有两列。 One column with an integer (in the MWE below it is a bool) and a second column with a datetime index.一列带有整数(在下面的 MWE 中是一个布尔值)和第二列带有日期时间索引。 Similar, to the above example, I want select for each unique value in the first column the last row.与上面的示例类似,我想为第一列最后一行中的每个唯一值选择。 In my example, it means the value with the most recent timestamp.在我的示例中,它表示具有最新时间戳的值。 The solution from the question above does not work, since I have no fixed start value for the second column.上述问题的解决方案不起作用,因为第二列没有固定的起始值。

MWE: MWE:

import pandas as pd

df = pd.DataFrame(range(10), index=pd.date_range(pd.Timestamp("2020.01.01"), pd.Timestamp("2020.01.01") + pd.Timedelta(hours=50), 10))
mask = (df.index.hour > 1) & (df.index.hour < 9)
df.groupby(mask)
df = df.groupby(mask).rolling("4h").mean()

The resulting dataframe looks like:生成的数据框如下所示:

                             0
False 2020-01-01 00:00:00  0.0
      2020-01-01 11:06:40  2.0
      2020-01-01 16:40:00  3.0
      2020-01-01 22:13:20  4.0
      2020-01-02 09:20:00  6.0
      2020-01-02 14:53:20  7.0
      2020-01-02 20:26:40  8.0
True  2020-01-01 05:33:20  1.0
      2020-01-02 03:46:40  5.0
      2020-01-03 02:00:00  9.0

Now, I want to get for each value in the first column the row with the most recent time stamp.现在,我想为第一列中的每个值获取具有最新时间戳的行。 Ie, I would like to get the following dataframe:即,我想获得以下数据框:

                             0
False 2020-01-02 20:26:40  8.0
True  2020-01-03 02:00:00  9.0

I would really appreciate ideas like in the mentioned link which do this.我真的很感激上面提到的链接中的想法。

Assuming values in level 1 are sorted try with groupby tail :假设级别 1 中的值已排序,请尝试使用groupby tail

out = df.groupby(level=0).tail(1)

out : out

                             0
False 2020-01-02 20:26:40  8.0
True  2020-01-03 02:00:00  9.0

If not sort_index first:如果不是sort_index首先:

out = df.sort_index(level=1).groupby(level=0).tail(1)

out : out

                             0
False 2020-01-02 20:26:40  8.0
True  2020-01-03 02:00:00  9.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM