[英]How can I calculate the correlation coefficient (R-values) for this set of values? (Linear Regression)
[英]How can I calculate the correlation coefficient between 2 numpy arrays when one of them has NAN values?
下面的數組是較長系列的節選。
我嘗試了這個:
np.corrcoef(A1, A2)
並得到這個:
array([[ 1., nan],
[nan, nan]])
array([118.76217 , 119.29147 , 119.737 , 120.0961 , 120.66373 ,
121.325195, 121.86492 , 122.27655 , 122.59397 , 122.97091 ,
123.84733 , 125.23529 , 126.442024, 127.58224 , 128.59303 ,
129.46916 , 130.55403 , 132.379 , 134.57579 , 136.9152 ,
139.08174 , 140.94403 , 142.54588 , 144.08707 , 145.62819 ,
147.26051 , 148.82619 , 150.28763 , 152.11078 , 153.83958 ,
155.80728 , 158.07167 , 160.01866 , 162.40714 , 165.73 ,
168.6646 , 171.11201 , 173.11388 , 174.95331 , 177.12701 ,
179.31892 , 181.48216 , 183.3753 , 185.30406 , 187.08716 ,
189.45274 , 191.74364 , 193.79718 , 196.03215 , 198.83864 ,
202.0072 , 204.65758 , 206.76361 , 208.48698 , 210.4281 ,
212.42377 , 214.2105 , 215.89319 , 218.44202 , 221.37914 ,
224.42348 , 226.92468 , 228.8517 ], dtype=float32)
array([ nan, nan, nan, nan, nan, nan,
nan, nan, 187.253 , 179.628 , 169.1065, 159.6525,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, 187.253 , 179.1705, 168.649 ,
159.5 , nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, nan])
一種選擇是使用掩碼數組:
import numpy as np
import numpy.ma as ma
A1 = np.array([118.76217 , 119.29147 , 119.737 , 120.0961 , 120.66373 ,
121.325195, 121.86492 , 122.27655 , 122.59397 , 122.97091 ,
123.84733 , 125.23529 , 126.442024, 127.58224 , 128.59303 ,
129.46916 , 130.55403 , 132.379 , 134.57579 , 136.9152 ,
139.08174 , 140.94403 , 142.54588 , 144.08707 , 145.62819 ,
147.26051 , 148.82619 , 150.28763 , 152.11078 , 153.83958 ,
155.80728 , 158.07167 , 160.01866 , 162.40714 , 165.73 ,
168.6646 , 171.11201 , 173.11388 , 174.95331 , 177.12701 ,
179.31892 , 181.48216 , 183.3753 , 185.30406 , 187.08716 ,
189.45274 , 191.74364 , 193.79718 , 196.03215 , 198.83864 ,
202.0072 , 204.65758 , 206.76361 , 208.48698 , 210.4281 ,
212.42377 , 214.2105 , 215.89319 , 218.44202 , 221.37914 ,
224.42348 , 226.92468 , 228.8517 ])
A2 = np.array([np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, 187.253, 179.628, 169.1065, 159.6525,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, 187.253, 179.1705, 168.649,
159.5, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
np.nan, np.nan, np.nan])
print(ma.corrcoef(ma.masked_invalid(A1), ma.masked_invalid(A2)))
# Prints:
#[[1.0 -0.07135569546454648]
# [-0.07135569546454648 1.0]]
另外,您可以將數組存儲在pandas
數據df.corr()
並使用Nan
友好的df.corr()
方法。
使用masked array numpy模塊,這將起作用:
import numpy as np
import numpy.ma as ma
A = np.asarray([1.12, 2.34, 3.33])
B = np.asarray([1.12, float('Inf') , 3.33])
print(ma.corrcoef(ma.masked_invalid(A), ma.masked_invalid(B)))
輸出:
[[1.0 1.0]
[1.0 1.0]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.