如何修復numpy TypeError：不支持的操作數類型-：'str'和'str'

Question

我一直在嘗試在 spyder IDE 上的 python 中實現多項式回歸模型，一切正常，最后當我嘗試從 numpy 添加排列函數時，它給了我以下錯誤！

import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np

dataset = pd.read_csv("Position_Salaries.csv")
X = dataset.iloc[:, 1:2]
y = dataset.iloc[:, 2]

#fitting the linear regression model
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X,y)

#fitting the polynomial linear Regression
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg2 = LinearRegression()
lin_reg2.fit(X_poly,y)

#visualising the linear regression results
plt.scatter(X,y ,color = 'red')
plt.plot(X,lin_reg.predict(X), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()

#the code doesnt work here on this np.arrange linee !!!
#visualisng the polynomial results
X_grid = np.arange(min(X),max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X,y ,color = 'red')
plt.plot(X_grid,lin_reg2.predict( poly_reg.fit_transform(X_grid)), color='blue')
plt.title('linear regression model')
plt.xlabel('positive level')
plt.ylabel('salary')
plt.show()

它應該運行和執行沒有任何錯誤！

錯誤追溯：-

TypeError                                 Traceback (most recent call last)

<ipython-input-24-428026f3698c> in <module>()
----> 1 x_grid = np.arange(min(x),max(x),0.1)
      2 print(x_grid, x)
      3 x_grid = x_grid.reshape((len(x_grid),1))
      4 
      5 plt.scatter(x, y, color = 'red')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Answer 1

如果此錯誤發生在：

np.arange(min(X),max(X), 0.1)

這一定是因為min(X)和max(X)是字符串。

In [385]: np.arange('123','125')                                                                                
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-385-0a55b396a7c3> in <module>
----> 1 np.arange('123','125')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

由於X是一個pandas對象（數據框或系列？），這並不奇怪。 pandas不能使用數字時，它可以自由地使用對象數據類型（並且不使用 numpy 字符串數據類型）：

X = dataset.iloc[:, 1:2]

np.arange(np.array('123'),np.array('125'))產生不同的錯誤，關於 'U3' dtypes。

LinearRegression 調用與這個X一起工作的事實有點令人費解，但我不知道它如何清理它的輸入。

無論如何，我會在arange調用之前檢查min(X) ，查看它的值和類型。 如果它是一個字符串，那么更詳細地探索X

在評論中，您說： there are two columns and all have integers from 1-10 and 45k to 100k. 那 '45k' 是整數還是字符串？

讓我們對一個虛擬數據框進行測試：

In [392]: df = pd.DataFrame([[1,45000],[2,46000],[3,47000]], columns=('A','B'))                                 
In [393]: df                                                                                                    
Out[393]: 
   A      B
0  1  45000
1  2  46000
2  3  47000
In [394]: min(df)                                                                                               
Out[394]: 'A'
In [395]: max(df)                                                                                               
Out[395]: 'B'

min和max生成字符串 - 從列名派生。

相比之下， fit函數可能正在處理數據幀的數組值：

In [397]: df.to_numpy()                                                                                         
Out[397]: 
array([[    1, 45000],
       [    2, 46000],
       [    3, 47000]])

不要認為事情應該奏效！ 測試、調試、打印可疑值。

min/max是 python 函數。 numpy 以數據幀敏感的方式運行 -

In [399]: np.min(df)      # delegates to df.min()                                                                                      
Out[399]: 
A        1
B    45000
dtype: int64
In [400]: np.max(df)                                                                                            
Out[400]: 
A        3
B    47000
dtype: int64

盡管這些也不是arange適當輸入。

你到底打算用這個arange調用產生什么？

arange一列范圍內的 arange 工作：

In [405]: np.arange(np.min(df['A']), np.max(df['A']),.1)                                                        
Out[405]: 
array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
       2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

Answer 2

試試這個代碼。 這對我有用，因為我也在做 Udemy 講座。

X_grid = np.arange(min(X ['Level']), max(X['Level']), 0.01, dtype= float) 
X_grid = X_grid.reshape((len(X_grid), 1))

#plotting
plt.scatter(X,y, color = 'red')
plt.plot(X,lin_reg2.predict(poly_reg.fit_transform(X)), color = 'blue') ``
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Position Level')
plt.ylabel('Salary')

Answer 3

你需要確保你的輸入是正確的類型。 在我看來，op 的類型都是str 。 也許嘗試通過float(x)或一些類似的函數將它們轉換為浮點數？

Answer 4

您應該檢查 X 和 y 中的內容。 它們可能是包含字符串的系列對象。 您想要的是提取 X 和 y 中的值並將它們轉換為浮點數/整數，然后再對它們執行任何操作。

就像是：

X = dataset.iloc[:, 1:2].astype(float)
y = dataset.iloc[:, 2].astype(float)

Answer 5

用這個：

x = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, -1:].values

因為您只需要接受x和y數值。

使用dataset.iloc[].values意味着它不會在x和y數據集中包含Level和Salary名稱。

Answer 6

代替，

X = dataset.iloc[:, 1:2] and y = dataset.iloc[:, 2]

和，

X = dataset.iloc[:, 1:2].values and y = dataset.iloc[:, 2].values

Answer 7

檢查您是否從數據集中獲取值。 記住是：

x = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

不是：

x = dataset.iloc[:, 1:-1]
y = dataset.iloc[:, -1]

如果沒有".values"您會得到錯誤消息顯示的字符串("str")

Answer 8

試試下面的代碼：

X_grid = np.arange(float(min(X ['Level'])), float(max(X['Level'])), 0.01, dtype= float)

如何修復numpy TypeError：不支持的操作數類型-：'str'和'str'

問題描述

8 個解決方案

解決方案1
1 2019-07-06 00:49:09

解決方案2
1 2020-04-08 10:55:06

解決方案3
0 2019-07-05 18:19:18

解決方案4
0 2019-07-05 18:21:11

解決方案5
0 2020-04-30 05:12:22

解決方案6
0 2020-06-14 08:38:18

解決方案7
0 2021-06-16 10:05:20

解決方案8
0 2021-07-04 14:29:26

如何修復numpy TypeError：不支持的操作數類型-：&#39;str&#39;和&#39;str&#39;

問題描述

8 個解決方案

解決方案1 1 2019-07-06 00:49:09

解決方案2 1 2020-04-08 10:55:06

解決方案3 0 2019-07-05 18:19:18

解決方案4 0 2019-07-05 18:21:11

解決方案5 0 2020-04-30 05:12:22

解決方案6 0 2020-06-14 08:38:18

解決方案7 0 2021-06-16 10:05:20

解決方案8 0 2021-07-04 14:29:26

如何修復numpy TypeError：不支持的操作數類型-：'str'和'str'

解決方案1
1 2019-07-06 00:49:09

解決方案2
1 2020-04-08 10:55:06

解決方案3
0 2019-07-05 18:19:18

解決方案4
0 2019-07-05 18:21:11

解決方案5
0 2020-04-30 05:12:22

解決方案6
0 2020-06-14 08:38:18

解決方案7
0 2021-06-16 10:05:20

解決方案8
0 2021-07-04 14:29:26