Here is my data frame df:
str1 str2 str3 str4
key1 3 4 2 5
key2 NaN 3 4 4
key3 NaN NaN NaN 2
and a vector w:
[0.2, 0.3, 0.5]
I usually use df.T.dot(w)
to compute the product between a dataframe with a vector. But I would like to know how can I avoid NaN value by re-weighting the vector and computing as long there is not a full column of NaN.
Example for my case:
For the first column I would like the vector to be equal to v = [0.2+0.3+0.5, 0, 0]
and compute [3, NaN, NaN]
times [1, 0, 0]
.
For the 2nd column, I would like v = [0.2+0.25, 0.3+0.25, 0]
.
For the 3rd column, I would like v = [0.2+0.25, 0.3+0.25, 0]
.
For the 4th column, I would like v
unchanged because no NaN.
Expected output:
str1 str2 str3 str4
0 3 3.45 3.1 4.7
If a NaN-value should mean "weight = 0" in the inner product, then modify your dataframe like this before doing the computation:
df_without_nans = df.fillna(value=0.0) # 'value' can be dropped
dot_product = df_without_nans.T.dot(w)
I'm not sure there's an easy way to take care of nan
values. You might have to create your own dot product function to handle those values. Something like this might work:
df.apply(lambda x: (x * [1, 0, 0]).sum())
The pandas sum
method automatically ignores nan
values, so you don't have to explicitly find the values yourself. You'll likely replace [1, 0, 0]
with reference to some other array of your weights. I'm not sure how you have it arranged now to integrate it into the above suggestion.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.