简体   繁体   English

pandas Series.cumsum()vs pandas.expanding_sum()

[英]pandas Series.cumsum() vs pandas.expanding_sum()

assuming I have a pandas Series s, what is the difference between s.cumsum() and pd.expanding_sum(s)? 假设我有一个pandas Series s,s.cumsum()和pd.expanding_sum(s)之间有什么区别? (I guess the answer should be the same also for cummax()/cummin(), and pd.expanding_max()/pd.expanding_min()) (我猜对于cummax()/ cummin()和pd.expanding_max()/ pd.expanding_min())的答案也应该相同

The docs say: 文档说:

Note The output of the rolling_ and expanding_ functions do not return a NaN if there are at least min_periods non-null values in the current window. 注意如果当前窗口中至少有min_periods非空值,则rolling_和expanding_函数的输出不会返回NaN。 This differs from cumsum, cumprod, cummax, and cummin, which return NaN in the output wherever a NaN is encountered in the input. 这与cumsum,cumprod,cummax和cummin不同,后者在输入中遇到NaN的任何地方都会返回NaN。

Is this the only difference? 这是唯一的区别吗?

(assuming this is the only difference I don't understand why there needs to be 2 different methods defined for this very similar functionality) (假设这是唯一的区别,我不明白为什么需要为这个非常相似的功能定义2种不同的方法)

They are basically the same, but you will get NaNs with expanding_sum until you reach the required minimum number of observations. 它们基本相同,但是您将获得带有expanding_sum NaN,直到达到所需的最小观察数。

s = pd.Series([1] * 5)

>>> s.cumsum()
0    1
1    2
2    3
3    4
4    5
dtype: int64

>>> pd.expanding_sum(s, min_periods=3)
0   NaN
1   NaN
2     3
3     4
4     5
dtype: float64

expanding_sum also allows you to pre-conform your time indexed data, apparently based on mean . expanding_sum还允许您预先符合您的时间索引数据,显然基于mean

s = pd.Series([0, 1] * 5, index=pd.date_range('2015-1-1', periods=10, freq='12H'))

>>> s
2015-01-01 00:00:00    0
2015-01-01 12:00:00    1
2015-01-02 00:00:00    0
2015-01-02 12:00:00    1
2015-01-03 00:00:00    0
2015-01-03 12:00:00    1
2015-01-04 00:00:00    0
2015-01-04 12:00:00    1
2015-01-05 00:00:00    0
2015-01-05 12:00:00    1
Freq: 12H, dtype: int64

>>> pd.expanding_sum(s, min_periods=3, freq='1D')
2015-01-01    NaN
2015-01-02    NaN
2015-01-03    1.5
2015-01-04    2.0
2015-01-05    2.5
Freq: D, dtype: float64

The documentation contains more information on expanding window moment functions . 该文档包含有关扩展窗口力矩函数的更多信息。

Regarding the difference in how the two methods treat NaNs, here is an illustrative example: 关于两种方法如何处理NaN的区别,这里有一个说明性的例子:

s = pd.Series([1] * 5)
s.loc[2] = None

>>> s.cumsum()
0     1
1     2
2   NaN
3     3
4     4
dtype: float64

>>> pd.expanding_sum(s)
0    1
1    2
2    2
3    3
4    4
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM