使用sklearn数组进行线性回归

Question

Just trying to set up a simple linear regression test based on the following example . 仅根据以下示例尝试建立一个简单的线性回归测试。

Here is my code: 这是我的代码：

# Normalize customer data
x_array = np.array(CustomerRFM['recency'])
normalized_X = preprocessing.normalize([x_array])
y_array = np.array(CustomerRFM['monetary_value'])
normalized_Y = preprocessing.normalize([y_array])

print('normalized_X: ' + str(np.count_nonzero(normalized_X)))
print('normalized_Y: ' + str(np.count_nonzero(normalized_Y)))

X_train, X_test = train_test_split(normalized_X, test_size=0.2)
Y_train, Y_test = train_test_split(normalized_Y, test_size=0.2)

print('X_train: ' + str(np.count_nonzero(X_train)))
print('Y_train: ' + str(np.count_nonzero(Y_train)))

regr = LinearRegression()
regr.fit(X_train, Y_train)

I have added the four print() lines as I am getting a strange issue. 我添加了四个print()行，因为我遇到了一个奇怪的问题。 The console print of these four lines is: 这四行的控制台打印为：

normalized_X: 4304
normalized_Y: 4338
X_train: 0
Y_train: 0

For some reason when I am splitting the data between training and testing data I get no values? 由于某些原因，当我在训练和测试数据之间拆分数据时，我没有任何价值吗？

I get the following error on the regr.fit() line: 我在regr.fit()行上收到以下错误：

ValueError: Found array with 0 sample(s) (shape=(0, 4339)) while a minimum of 1 is required. ValueError：找到的数组包含0个样本（shape =（0，4339）），而最少需要1个。

This tells me there is something wrong with the X values but I don't know what 这告诉我X值有问题，但我不知道是什么

UPDATE: Change to print(array.shape) 更新：更改为print（array.shape）

If I change my code to use 如果我更改代码以使用

print('normalized_X: ' + str(normalized_X.shape))
print('normalized_Y: ' + str(normalized_Y.shape))

and this: 和这个：

print('X_train: ' + str(X_train.shape))
print('Y_train: ' + str(Y_train.shape))

I get: 我得到：

normalized_X: (1, 4339)
normalized_Y: (1, 4339)

and this: 和这个：

X_train: (0, 4339)
Y_train: (0, 4339)

Answer 1

It looks like you're using preprocessing.normalize incorrectly. 看来您使用的是preprocessing.normalize错误。 By wrapping [x_array] in square brackets, you're creating an array of shape (1, 4339) . 通过将[x_array]包裹在方括号中，您将创建一个形状数组(1, 4339) 。

According to the docs , preprocessing.normalize expects an array of shape [n_samples, n_features] . 根据文档， preprocessing.normalize需要一个形状为[n_samples, n_features]的数组。 In your example, n_samples is 1 and n_features is 4339 which I don't think is what you want! 在您的示例中， n_samples为1， n_features为4339，我认为这不是您想要的！ You're then asking train_test_split to split a data set of one sample, so it understandably returns an empty array. 然后，您要让train_test_split拆分一个样本的数据集，因此可以理解地返回一个空数组。

使用sklearn数组进行线性回归

问题描述

1 个解决方案

解决方案1
1 2019-01-01 19:04:48

使用sklearn数组进行线性回归

问题描述

1 个解决方案

解决方案1 1 2019-01-01 19:04:48

解决方案1
1 2019-01-01 19:04:48