使用來自 train_test_split() 的值列表作為訓練數據

Question

我正在嘗試對某些數據進行線性回歸。 這就是數據的樣子。

X = df['vectors']看起來像這樣：

0      [-1.86135, 1.3202, 0.023501, -2.9511, 1.62135,...
1      [0.5487195, 0.27389452, 0.49712706, 0.6853927,...
2      [-1.3525691, -0.8444542, 2.8269022, -1.4456564...
3      [1.0730275, -0.14970247, -1.1424525, -1.953272...
4      [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...

當我對其運行線性回歸 model 時：

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
lm = LinearRegression()
lm.fit(X_train, y_train)

我收到此錯誤：

TypeError                                 Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

如何將 X 中的值轉換為標量？ 我正在考慮獲得向量的平均值，但不確定如何 go 關於它。

Answer 1

從外觀上看， X是pandas.Series object。

由於X的每一行內的所有列表都具有相同的長度，因此您可以將X重塑為具有與X相同的行數和與每個列表中的元素一樣多的列的 ndarray。

# Import numpy
import numpy as np

# Reshape
X = np.array(X.explode()).reshape(len(X), -1)

# Do the same as before
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
lm = LinearRegression()
lm.fit(X_train, y_train)

Answer 2

嘗試使用numpy.array將該列表轉換為數組，然后將其設為二維，因為sklearn與 arrays 一起使用，並且它需要更高維數據。

使用來自 train_test_split() 的值列表作為訓練數據

問題描述

2 個解決方案

解決方案1
1 已采納 2021-05-13 01:23:43

解決方案2
0 2021-05-13 00:57:17

使用來自 train_test_split() 的值列表作為訓練數據

問題描述

2 個解決方案

解決方案1 1 已采納 2021-05-13 01:23:43

解決方案2 0 2021-05-13 00:57:17

解決方案1
1 已采納 2021-05-13 01:23:43

解決方案2
0 2021-05-13 00:57:17