使用Python进行线性回归（Pandas和Numpy）

Question

I am trying to implement linear regression using python. 我正在尝试使用python实现线性回归。

I did the following steps: 我做了以下步骤：

import pandas as p
import numpy as n
data = p.read_csv("...path\Housing.csv", usecols=[1]) # I want the first col
data1 = p.read_csv("...path\Housing.csv", usecols=[3]) # I want the 3rd col
x = data
y = data1

Then I try to obtain the co-efficients, and use the following: 然后我尝试获得系数，并使用以下内容：

regression_coeff = n.polyfit(x,y,1)

And then I get the following error: 然后我收到以下错误：

raise TypeError("expected 1D vector for x")
TypeError: expected 1D vector for x

I am unable to get my head around this, as when I print x and y , I can very clearly see that they are both 1D vectors. 我无法理解这一点，因为当我打印x和y ，我可以非常清楚地看到它们都是一维矢量。

Can someone please help? 有人可以帮忙吗？

Dataset can be found here: DataSets 数据集可以在这里找到： DataSet

The original code is: 原始代码是：

import pandas as p
import numpy as n

data = pd.read_csv('...\housing.csv', usecols = [1])
data1 = pd.read_csv('...\housing.csv', usecols = [3])

x = data
y = data1
regression = n.polyfit(x, y, 1)

Answer 1

This should work: 这应该工作：

np.polyfit(data.values.flatten(), data1.values.flatten(), 1)

data is a dataframe and its values are 2D: data是一个数据帧，其值为2D：

>>> data.values.shape
(546, 1)

flatten() turns it into 1D array: flatten()将其转换为1D数组：

>> data.values.flatten().shape
(546,)

which is needed for polyfit() . 这是polyfit()所需要的。

Simpler alternative: 更简单的选择：

df = pd.read_csv("Housing.csv")
np.polyfit(df['price'], df['bedrooms'], 1)

Answer 2

Python is telling you that the data is not in the right format, in particular x must be a 1D array, in your case it is a 2D-ish panda array. Python告诉你数据的格式不正确，特别是x必须是1D数组，在你的情况下它是2D-ish panda数组。 You can transform your data in a numpy array and squeeze it to fix your problem. 您可以在一个numpy数组中转换数据并挤压它来解决您的问题。

import pandas as pd
import numpy as np

data = pd.read_csv('../Housing.csv', usecols = [1])
data1 = pd.read_csv('../Housing.csv', usecols = [3])
data = np.squeeze(np.array(data))
data1 = np.squeeze(np.array(data1))

x = data
y = data1
regression = np.polyfit(x, y, 1)

Answer 3

pandas.read_csv() returns a DataFrame , which has two dimensions while np.polyfit wants a 1D vector for both x and y for a single fit. pandas.read_csv()返回一个DataFrame ，它有两个维度，而np.polyfit想要x和y一1D vector用于单个拟合。 You can simply convert the output of read_csv() to a pd.Series to match the np.polyfit() input format using .squeeze() : 您可以输出简单地转换read_csv()到pd.Series匹配np.polyfit()使用输入格式.squeeze()

data = pd.read_csv('../Housing.csv', usecols = [1]).squeeze()
data1 = p.read_csv("...path\Housing.csv", usecols=[3]).squeeze()

使用Python进行线性回归（Pandas和Numpy）

问题描述

3 个解决方案

解决方案1
5 已采纳 2016-04-01 14:25:42

解决方案2
2 2016-04-01 14:39:15

解决方案3
2 2016-04-01 14:43:51

使用Python进行线性回归（Pandas和Numpy）

问题描述

3 个解决方案

解决方案1 5 已采纳 2016-04-01 14:25:42

解决方案2 2 2016-04-01 14:39:15

解决方案3 2 2016-04-01 14:43:51

解决方案1
5 已采纳 2016-04-01 14:25:42

解决方案2
2 2016-04-01 14:39:15

解决方案3
2 2016-04-01 14:43:51