pandas Series.cumsum（）vs pandas.expanding_sum（）

Question

assuming I have a pandas Series s, what is the difference between s.cumsum() and pd.expanding_sum(s)? 假设我有一个pandas Series s，s.cumsum（）和pd.expanding_sum（s）之间有什么区别？ (I guess the answer should be the same also for cummax()/cummin(), and pd.expanding_max()/pd.expanding_min()) （我猜对于cummax（）/ cummin（）和pd.expanding_max（）/ pd.expanding_min（））的答案也应该相同

The docs say: 文档说：

Note The output of the rolling_ and expanding_ functions do not return a NaN if there are at least min_periods non-null values in the current window. 注意如果当前窗口中至少有min_periods非空值，则rolling_和expanding_函数的输出不会返回NaN。 This differs from cumsum, cumprod, cummax, and cummin, which return NaN in the output wherever a NaN is encountered in the input. 这与cumsum，cumprod，cummax和cummin不同，后者在输入中遇到NaN的任何地方都会返回NaN。

Is this the only difference? 这是唯一的区别吗？

(assuming this is the only difference I don't understand why there needs to be 2 different methods defined for this very similar functionality) （假设这是唯一的区别，我不明白为什么需要为这个非常相似的功能定义2种不同的方法）

Answer 1

They are basically the same, but you will get NaNs with expanding_sum until you reach the required minimum number of observations. 它们基本相同，但是您将获得带有expanding_sum NaN，直到达到所需的最小观察数。

s = pd.Series([1] * 5)

>>> s.cumsum()
0    1
1    2
2    3
3    4
4    5
dtype: int64

>>> pd.expanding_sum(s, min_periods=3)
0   NaN
1   NaN
2     3
3     4
4     5
dtype: float64

expanding_sum also allows you to pre-conform your time indexed data, apparently based on mean . expanding_sum还允许您预先符合您的时间索引数据，显然基于mean 。

s = pd.Series([0, 1] * 5, index=pd.date_range('2015-1-1', periods=10, freq='12H'))

>>> s
2015-01-01 00:00:00    0
2015-01-01 12:00:00    1
2015-01-02 00:00:00    0
2015-01-02 12:00:00    1
2015-01-03 00:00:00    0
2015-01-03 12:00:00    1
2015-01-04 00:00:00    0
2015-01-04 12:00:00    1
2015-01-05 00:00:00    0
2015-01-05 12:00:00    1
Freq: 12H, dtype: int64

>>> pd.expanding_sum(s, min_periods=3, freq='1D')
2015-01-01    NaN
2015-01-02    NaN
2015-01-03    1.5
2015-01-04    2.0
2015-01-05    2.5
Freq: D, dtype: float64

The documentation contains more information on expanding window moment functions . 该文档包含有关扩展窗口力矩函数的更多信息。

Regarding the difference in how the two methods treat NaNs, here is an illustrative example: 关于两种方法如何处理NaN的区别，这里有一个说明性的例子：

s = pd.Series([1] * 5)
s.loc[2] = None

>>> s.cumsum()
0     1
1     2
2   NaN
3     3
4     4
dtype: float64

>>> pd.expanding_sum(s)
0    1
1    2
2    2
3    3
4    4
dtype: float64

pandas Series.cumsum（）vs pandas.expanding_sum（）

问题描述

1 个解决方案

解决方案1
3 2015-10-02 16:51:08

pandas Series.cumsum（）vs pandas.expanding_sum（）

问题描述

1 个解决方案

解决方案1 3 2015-10-02 16:51:08

解决方案1
3 2015-10-02 16:51:08