简体   繁体   中英

How to avoid NaN in a weighted average?

Here is my data frame df:

       str1    str2     str3     str4    
key1     3       4       2        5
key2    NaN      3       4        4
key3    NaN     NaN     NaN       2

and a vector w:

 [0.2, 0.3, 0.5]

I usually use df.T.dot(w) to compute the product between a dataframe with a vector. But I would like to know how can I avoid NaN value by re-weighting the vector and computing as long there is not a full column of NaN.

Example for my case:

For the first column I would like the vector to be equal to v = [0.2+0.3+0.5, 0, 0] and compute [3, NaN, NaN] times [1, 0, 0] .

For the 2nd column, I would like v = [0.2+0.25, 0.3+0.25, 0] .

For the 3rd column, I would like v = [0.2+0.25, 0.3+0.25, 0] .

For the 4th column, I would like v unchanged because no NaN.

Expected output:

          str1   str2  str3    str4
    0      3     3.45   3.1     4.7

If a NaN-value should mean "weight = 0" in the inner product, then modify your dataframe like this before doing the computation:

df_without_nans = df.fillna(value=0.0)  # 'value' can be dropped
dot_product = df_without_nans.T.dot(w)

I'm not sure there's an easy way to take care of nan values. You might have to create your own dot product function to handle those values. Something like this might work:

df.apply(lambda x: (x * [1, 0, 0]).sum())

The pandas sum method automatically ignores nan values, so you don't have to explicitly find the values yourself. You'll likely replace [1, 0, 0] with reference to some other array of your weights. I'm not sure how you have it arranged now to integrate it into the above suggestion.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM