简体   繁体   English

通过 prod() 重新采样 datetimeindex function 将 NaN 更改为 1

[英]resample datetimeindex via prod() function changes NaN to 1

I am working with a rather large dataset.我正在处理一个相当大的数据集。 After applying the resample command in combination with the conversion method "prod" (multiplication), I realized that my NaN values were changed to 1, which is not what I intended.在结合转换方法“prod”(乘法)应用 resample 命令后,我意识到我的 NaN 值已更改为 1,这不是我想要的。 To give an example what happened:举个例子:

# build random dataframe with one column containing NaN
import pandas as pd
import numpy as np

index = pd.date_range('1/1/2000', periods=7, freq='d')
df = pd.DataFrame(index = index, columns = ["Score 1", "Score 2", "Score 3"])

df["Score 1"] = np.random.randint(1,20,size=7)
df["Score 2"] = np.random.randint(1,20,size=7)
df["Score 3"] = [1, 2, 3, np.NaN, np.NaN, np.NaN, np.NaN]
print(df)

            Score 1     Score 2     Score 3
2000-01-01  6            7          1.0
2000-01-02  2            15         2.0
2000-01-03  8            19         3.0
2000-01-04  14           19         NaN
2000-01-05  17           8          NaN
2000-01-06  15           6          NaN
2000-01-07  12           18         NaN

Now lets say I want to resample my Dataframe from a daily to a 3-day Frequency with using the "prod" conversion method.现在假设我想使用“prod”转换方法将我的 Dataframe 从每日频率重新采样到 3 天频率。 I do so by:我这样做是:

df.resample("3d").agg("prod")
print(df)

            Score 1     Score 2     Score 3
2000-01-01  96          1995        6.0
2000-01-04  3570        2052        1.0
2000-01-07  12            18        1.0

Looking at the column "Score 3", my NaN values suddenly changed to 1, which is a surprise for me.看着“Score 3”一栏,我的 NaN 值突然变成了 1,这让我很意外。 This means that when multiplying NaN with each other, I would get =1.这意味着当 NaN 彼此相乘时,我会得到 =1。 Does anyone why exactly a multiplication of NaN's equals one and what I could do to keep the NaN value in case it is multiplicated with itself?有谁知道为什么 NaN 的乘法等于 1,如果 NaN 与自身相乘,我可以做些什么来保持它的值?

Thanks in advance, any help is highly appreciated在此先感谢,非常感谢任何帮助

pandas.DataFrame.prod function ( docs ) by default sets NaN to 1: pandas.DataFrame.prod function ( docs ) 默认将NaN设置为 1:

pd.Series([np.NaN, np.NaN]).prod()
# 1.0

You can circumvent this by setting the according keyword:您可以通过设置相应的关键字来规避这种情况:

pd.Series([np.NaN, np.NaN]).prod(skipna=False)
# nan

In your case, you could apply that as在您的情况下,您可以将其应用为

print(df)
            Score 1  Score 2  Score 3
2000-01-01       18       19      1.0
2000-01-02        9       18      2.0
2000-01-03       10        4      3.0
2000-01-04        4       15      4.0
2000-01-05       12        1      NaN
2000-01-06        1        3      NaN
2000-01-07        8        9      NaN

print(df.resample("3d").agg(pd.DataFrame.prod, skipna=False))
            Score 1  Score 2  Score 3
2000-01-01     1620     1368      6.0
2000-01-04       48       45      NaN
2000-01-07        8        9      NaN

Note that this will set all resampled time windows to NaN if the window contains at least one NaN value - I changed the example df slightly to show that.请注意,如果 window 包含至少一个NaN值,这会将所有重采样时间 windows 设置为NaN - 我稍微更改了示例df以显示这一点。 You can apply a lambda instead, checking if at least one element is not NaN :您可以apply lambda代替,检查是否至少一个元素不是NaN

print(df.resample("3d").apply(lambda x: x.prod() if any(x.notnull()) else np.nan))
            Score 1  Score 2  Score 3
2000-01-01     1620     1368      6.0
2000-01-04       48       45      4.0
2000-01-07        8        9      NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM