[英]pandas Series.cumsum() vs pandas.expanding_sum()
assuming I have a pandas Series s, what is the difference between s.cumsum() and pd.expanding_sum(s)? 假设我有一个pandas Series s,s.cumsum()和pd.expanding_sum(s)之间有什么区别? (I guess the answer should be the same also for cummax()/cummin(), and pd.expanding_max()/pd.expanding_min())
(我猜对于cummax()/ cummin()和pd.expanding_max()/ pd.expanding_min())的答案也应该相同
The docs say: 文档说:
Note The output of the rolling_ and expanding_ functions do not return a NaN if there are at least min_periods non-null values in the current window.
注意如果当前窗口中至少有min_periods非空值,则rolling_和expanding_函数的输出不会返回NaN。 This differs from cumsum, cumprod, cummax, and cummin, which return NaN in the output wherever a NaN is encountered in the input.
这与cumsum,cumprod,cummax和cummin不同,后者在输入中遇到NaN的任何地方都会返回NaN。
Is this the only difference? 这是唯一的区别吗?
(assuming this is the only difference I don't understand why there needs to be 2 different methods defined for this very similar functionality) (假设这是唯一的区别,我不明白为什么需要为这个非常相似的功能定义2种不同的方法)
They are basically the same, but you will get NaNs with expanding_sum
until you reach the required minimum number of observations. 它们基本相同,但是您将获得带有
expanding_sum
NaN,直到达到所需的最小观察数。
s = pd.Series([1] * 5)
>>> s.cumsum()
0 1
1 2
2 3
3 4
4 5
dtype: int64
>>> pd.expanding_sum(s, min_periods=3)
0 NaN
1 NaN
2 3
3 4
4 5
dtype: float64
expanding_sum
also allows you to pre-conform your time indexed data, apparently based on mean
. expanding_sum
还允许您预先符合您的时间索引数据,显然基于mean
。
s = pd.Series([0, 1] * 5, index=pd.date_range('2015-1-1', periods=10, freq='12H'))
>>> s
2015-01-01 00:00:00 0
2015-01-01 12:00:00 1
2015-01-02 00:00:00 0
2015-01-02 12:00:00 1
2015-01-03 00:00:00 0
2015-01-03 12:00:00 1
2015-01-04 00:00:00 0
2015-01-04 12:00:00 1
2015-01-05 00:00:00 0
2015-01-05 12:00:00 1
Freq: 12H, dtype: int64
>>> pd.expanding_sum(s, min_periods=3, freq='1D')
2015-01-01 NaN
2015-01-02 NaN
2015-01-03 1.5
2015-01-04 2.0
2015-01-05 2.5
Freq: D, dtype: float64
The documentation contains more information on expanding window moment functions . 该文档包含有关扩展窗口力矩函数的更多信息。
Regarding the difference in how the two methods treat NaNs, here is an illustrative example: 关于两种方法如何处理NaN的区别,这里有一个说明性的例子:
s = pd.Series([1] * 5)
s.loc[2] = None
>>> s.cumsum()
0 1
1 2
2 NaN
3 3
4 4
dtype: float64
>>> pd.expanding_sum(s)
0 1
1 2
2 2
3 3
4 4
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.