繁体   English   中英

用scipy.stats拟合线性回归; 数组形状错误

[英]Fitting a linear regression with scipy.stats; error in array shapes

我已经编写了一些代码来使用pandas读取数据文件并使用numpy处理数据。 这会导致numpy数组中存在一些NaN 我将其屏蔽掉,以便可以对scipy.stats进行线性回归拟合:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

def makeArray(band):
    """
    Takes as argument a string as the name of a wavelength band.
    Converts the list of magnitudes in that band into a numpy array,
    replacing invalid values (where invalid == -999) with NaNs.
    Returns the array.
    """
    array_name = band + '_mag'
    array = np.array(df[array_name])
    array[array==-999]=np.nan
    return array

#   Read data file
fields = ['no', 'NED', 'z', 'obj_type','S_21', 'power', 'SI_flag', 
          'U_mag', 'B_mag', 'V_mag', 'R_mag', 'K_mag', 'W1_mag',
          'W2_mag', 'W3_mag', 'W4_mag', 'L_UV', 'Q', 'flag_uv']

magnitudes = ['U_mag', 'B_mag', 'V_mag', 'R_mag', 'K_mag', 'W1_mag',
          'W2_mag', 'W3_mag', 'W4_mag']

df = pd.read_csv('todo.dat', sep = ' ',
                   names = fields, index_col = False)

#   Define axes for processing
redshifts = np.array(df['z'])
y = np.log(makeArray('K'))
mask = np.isnan(y)

plt.scatter(redshifts, y, label = ('K'), s = 2, color = 'r')
slope, intercept, r_value, p_value, std_err = stats.linregress(redshifts, y[mask])
fit = slope*redshifts + intercept

plt.legend()
plt.show()

但是我计算stats参数的行和拟合线(第三到第四行)给我以下错误:

Traceback (most recent call last):

  File "<ipython-input-77-ec9f43cdfa9b>", line 1, in <module>
    runfile('C:/Users/Jeremy/Dropbox/Notes/Postgrad/Masters Research/VUW/QSOs/read_csv.py', wdir='C:/Users/Jeremy/Dropbox/Notes/Postgrad/Masters Research/VUW/QSOs')

  File "C:\Users\Jeremy\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\Users\Jeremy\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Jeremy/Dropbox/Notes/Postgrad/Masters Research/VUW/QSOs/read_csv.py", line 35, in <module>
    slope, intercept, r_value, p_value, std_err = stats.linregress(redshifts, y[mask])

  File "C:\Users\Jeremy\Anaconda3\lib\site-packages\scipy\stats\_stats_mstats_common.py", line 92, in linregress
    ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat

  File "C:\Users\Jeremy\Anaconda3\lib\site-packages\numpy\lib\function_base.py", line 2865, in cov
    X = np.vstack((X, y))

  File "C:\Users\Jeremy\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)

ValueError: all the input array dimensions except for the concatenation axis must match exactly

变量的形状如下:

在此处输入图片说明

因此我不确定错误的含义或解决方法。 有没有解决的办法? 还是我可以使用另一个模块代替scipy.stats来使我拟合线性回归?

问题是y[mask]redshifts长度不同。

以下是显示问题的简单示例代码。

import numpy as np

na = np.array
y = na([np.nan, 4, 5, 6, 7, 8, np.nan, 9, 10, np.nan])
mask = np.isnan(y)
print(len(y), len(y[mask]))

您将必须用类似..的值替换ynan值。

print('old y: ', y)

for idx, m in enumerate(mask):
    if m:
        y[idx] = 1000 # or whatever value you decide on

print('new y: ', y)

完整的示例代码...

import numpy as np

na = np.array
y = na([np.nan, 4, 5, 6, 7, 8, np.nan, 9, 10, np.nan])
mask = np.isnan(y)

print(len(y), len(y[mask]))

print('old y: ', y)

for idx, m in enumerate(mask):
    if m:
        y[idx] = 1000 # or whatever value you decide on

print('new y: ', y)
print(len(y))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM