返回 pandas 中每个月的最后日期和值

Question

我在 pandas 有一个每日数据的 df。 我想返回每个月的最后一个值。 我认为简单的解决方案是.resample("M").apply(lambda ser: ser.iloc[-1,])但是，似乎resample实际上计算的是月末日期而不是返回实际日期出现在那个月。 这是有意的行为吗？ MWE：

import pandas as pd
import numpy as np
df = pd.Series(np.arange(100), index=pd.date_range(start="2000-01-02", periods=100)).to_frame()
df.sort_index().resample("M").apply(lambda ser: ser.iloc[-1,])
#             0
#2000-01-31  29
#2000-02-29  58
#2000-03-31  89
#2000-04-30  99

而df中出现的最后日期是2000-04-10

Answer 1

您可能需要查看groupby + tail

df.groupby(df.index.month).tail(1)
Out[18]: 
             0
2000-01-31  29
2000-02-29  58
2000-03-31  89
2000-04-10  99

Answer 2

通过使用带有偏移量M resample ，您可以将采样向下采样到日历月的月底 （请参阅链接的偏移量文档），然后传递一个函数。 因此，您的索引始终是该月的最后一天，这确实是预期的行为。 您正在应用的函数（ lambda ser: ser.iloc[-1,] ）只是说：对于这一天结束的日历日期，原始数据中找到的最后一个值是什么。

例如，您还可以使用偏移量MS而不是M重新采样到月份开始，结果将是相同的，除了索引将是日历月份的第一天而不是最后一天：

# Resample to month end, as you had originally:
>>> df.sort_index().resample("M").apply(lambda ser: ser.iloc[-1,])
             0
2000-01-31  29
2000-02-29  58
2000-03-31  89
2000-04-30  99

# Resample to month start: same data, except index is month start instead of month end
>>> df.sort_index().resample("MS").apply(lambda ser: ser.iloc[-1,])
             0
2000-01-01  29
2000-02-01  58
2000-03-01  89
2000-04-01  99

正如Wen所指出的那样，如果您只想显示数据中的实际最后日期，最好使用groupby。 如果要将数据上采样或下采样到不同的时间频率 ，而不是从原始时间频率中选择真实数据，那么重新采样很有用

Answer 3

自 2023 年起，您可以使用以下内容。

df.groupby([df.index.dt.year, df.index.dt.month])

如果您只按月分组，您将获得数据中所有年份中每个月的最后一个值。

返回 pandas 中每个月的最后日期和值

问题描述

3 个解决方案

解决方案1
2 2018-07-31 03:20:57

解决方案2
2 已采纳 2018-07-31 03:42:52

解决方案3
0 2023-01-05 17:00:42

返回 pandas 中每个月的最后日期和值

问题描述

3 个解决方案

解决方案1 2 2018-07-31 03:20:57

解决方案2 2 已采纳 2018-07-31 03:42:52

解决方案3 0 2023-01-05 17:00:42

解决方案1
2 2018-07-31 03:20:57

解决方案2
2 已采纳 2018-07-31 03:42:52

解决方案3
0 2023-01-05 17:00:42