如何对sklearn糖尿病数据集进行非规范化处理？

Question

There is a nice example of linear regression in sklearn using a diabetes dataset. 使用糖尿病数据集，在sklearn有一个很好的线性回归示例。

I copied the notebook version and played with it a bit in Jupyterlab. 我复制了笔记本版本，并在Jupyterlab中玩了一下。 Of course, it works just like the example. 当然，它的工作原理与示例相同。 But I wondered what I was really seeing. 但是我想知道我真正看到的是什么。

There is a chart with unlabeled axes. 有一个带有未标记轴的图表。
I wondered what the label (dependent variable) was. 我想知道标签（因变量）是什么。
I wondered which of the 10 independent variables was being used. 我想知道正在使用10个独立变量中的哪个。

So I played around with the nice features provided by ipython/jupyter: 因此，我试用了ipython / jupyter提供的出色功能：

diabetes.DESCR

Diabetes dataset
================
Notes
-----
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of 
n = 442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

Data Set Characteristics:
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attributes:
:Age:
:Sex:
:Body mass index:
:Average blood pressure:
:S1:
:S2:
:S3:
:S4:
:S5:
:S6:

Note: Each of these 10 feature variables have been mean centered and scaled by the standard
deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
http://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) 
"Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)'

From the Source URL, we are led to the original raw data which is a tab-separated unnormalized copy of the data. 从源URL，我们将获得原始原始数据，该原始数据是制表符分隔的数据的非规范化副本。 It also further explains what the "S" features were in the problem domain. 它还进一步说明了问题域中的“ S”功能是什么。

Interestingly, sex was one of [1,2] with a guess as to what they meant. 有趣的是，性别是[1,2]中的一种，可以猜测其含义。

But my real question is whether there is a way within sklearn to determine 但是我真正的问题是sklearn是否有一种方法可以确定

how to denormalize the data in sklearn? sklearn中的数据如何规范化？
Is there a way to denormalize the coefficients and intercept so that one could express the fit algebraically? 有没有一种方法可以对系数进行非规格化并截取，以便可以代数表达拟合？

or is this just a demonstration of linear regression? 还是仅仅是线性回归的证明？

Answer 1

There is no way to denormalize data without any information about the data prior to the normalization. 在规范化之前，如果没有有关数据的任何信息，就无法对数据进行规范化。 However, note that the sklearn.preprocessing classes MinMaxScaler , StandardScaler , etc. do include inverse_transform methods ( example ), so if this were also provided in the example it would be easy to do. 但是，请注意sklearn.preprocessing类MinMaxScaler ， StandardScaler等确实包含inverse_transform方法（示例），因此，如果示例中也提供了此方法，则将很容易做到。 As it stands, as you say, this is just a regression demonstration. 就像您所说的那样，这只是回归演示。

如何对sklearn糖尿病数据集进行非规范化处理？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-09-13 18:58:54

如何对sklearn糖尿病数据集进行非规范化处理？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-09-13 18:58:54

解决方案1
1 已采纳 2018-09-13 18:58:54