简体   繁体   中英

statmodels OLS giving a TypeError in python

I am trying to fit a set of features to statsmodel's OLS linear regression model.

I am adding a few features at a time. With the first two features, it works fine. But when I keep adding new features it gives me an error.

Traceback (most recent call last):
  File "read_xml.py", line 337, in <module>
    model = sm.OLS(Y, X).fit()
...
  File "D:\pythonprojects\testproj\test_env\lib\site-packages\statsmodels\base\data.py", line 132, in _handle_constant
    if not np.isfinite(ptp_).all():
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

So I changed the type of input using

X = X.astype(float)

Then a different error pops out.

Traceback (most recent call last):
  File "read_xml.py", line 339, in <module>
    print(model.summary())
...
File "D:\pythonprojects\testproj\test_env\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1824, in sf
    place(output, (1-cond0)+np.isnan(x), self.badvalue)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

My code looks like this.

new_df0 = pd.concat([df_lex[0], summary_df[0]], axis = 0, join = 'inner')
new_df1 = pd.concat([df_lex[1], summary_df[1]], axis = 0, join = 'inner')
data = pd.concat([new_df0, new_df1], axis = 1)
print(data.shape)
X = data.values[0:6,:]
Y = data.values[6,:]
Y = Y.reshape(1,88)
X = X.T
Y = Y.T
X = X.astype(float)
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print(model.summary())

First error triggered in model = sm.OLS(Y,X).fit() Second error triggered in model.summary()

But with some other features, there are no errors.

new_df0 = pd.concat([df_len[0], summary_df[0]], axis = 0, join = 'inner')
new_df1 = pd.concat([df_len[1], summary_df[1]], axis = 0, join = 'inner')

data = pd.concat([new_df0, new_df1], axis = 1)
print(data.shape)
X = data.values[0:2,:]
Y = data.values[2,:]
Y = Y.reshape(1,88)
X = X.T
Y = Y.T
X = X.astype(float)
print(X.shape)
print(Y.shape)

model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print(model.summary())

It looks like when I have only two features it works. But when different 6 features added, it gives the errors. My major concern is to understand the error. Because I have read similar question related to plots in python. But this is triggered in the built-in functions, not in my code. Any suggestions to debug is highly appreciated.

Y.astype(float)

成功了。

Check the type of X_opt and y . Probably it's float64, because of computational precision. So, try:

X_opt = X_opt.astype(np.float64)
y = y.astype(np.float64)

I had been the same error and fixed it in this way.

please use

model=sm.OLS(df.Y,df.X, missing='drop').fit()

It looks like there is a nan value in some variable. By default missing is none and this might be the cause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM