简体   繁体   English

如何避免加权平均中的NaN?

[英]How to avoid NaN in a weighted average?

Here is my data frame df: 这是我的数据框df:

       str1    str2     str3     str4    
key1     3       4       2        5
key2    NaN      3       4        4
key3    NaN     NaN     NaN       2

and a vector w: 和向量w:

 [0.2, 0.3, 0.5]

I usually use df.T.dot(w) to compute the product between a dataframe with a vector. 我通常使用df.T.dot(w)计算带有向量的数据df.T.dot(w)之间的乘积。 But I would like to know how can I avoid NaN value by re-weighting the vector and computing as long there is not a full column of NaN. 但是我想知道如何通过重新加权向量和计算来避免NaN值,只要没有完整的NaN列。

Example for my case: 我的例子:

For the first column I would like the vector to be equal to v = [0.2+0.3+0.5, 0, 0] and compute [3, NaN, NaN] times [1, 0, 0] . 对于第一列,我希望向量等于v = [0.2+0.3+0.5, 0, 0]并计算[3, NaN, NaN]乘以[1, 0, 0] [3, NaN, NaN] [1, 0, 0]

For the 2nd column, I would like v = [0.2+0.25, 0.3+0.25, 0] . 对于第二列,我希望v = [0.2+0.25, 0.3+0.25, 0]

For the 3rd column, I would like v = [0.2+0.25, 0.3+0.25, 0] . 对于第三列,我希望v = [0.2+0.25, 0.3+0.25, 0]

For the 4th column, I would like v unchanged because no NaN. 对于第4列,我希望v保持不变,因为没有NaN。

Expected output: 预期产量:

          str1   str2  str3    str4
    0      3     3.45   3.1     4.7

If a NaN-value should mean "weight = 0" in the inner product, then modify your dataframe like this before doing the computation: 如果NaN值在内部乘积中表示“权重= 0”,请在执行计算之前像这样修改数据框:

df_without_nans = df.fillna(value=0.0)  # 'value' can be dropped
dot_product = df_without_nans.T.dot(w)

I'm not sure there's an easy way to take care of nan values. 我不确定是否有一种简单的方法来处理nan值。 You might have to create your own dot product function to handle those values. 您可能必须创建自己的点积函数来处理这些值。 Something like this might work: 这样的事情可能会起作用:

df.apply(lambda x: (x * [1, 0, 0]).sum())

The pandas sum method automatically ignores nan values, so you don't have to explicitly find the values yourself. pandas sum方法会自动忽略nan值,因此您不必自己明确地找到这些值。 You'll likely replace [1, 0, 0] with reference to some other array of your weights. 您可能会参考其他权重数组来替换[1, 0, 0] I'm not sure how you have it arranged now to integrate it into the above suggestion. 我不确定您现在如何安排将其整合到上述建议中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM