[英]How do I denormalize the sklearn diabetes dataset?
There is a nice example of linear regression in sklearn
using a diabetes dataset. 使用糖尿病数据集,在sklearn
有一个很好的线性回归示例 。
I copied the notebook version and played with it a bit in Jupyterlab. 我复制了笔记本版本,并在Jupyterlab中玩了一下。 Of course, it works just like the example. 当然,它的工作原理与示例相同。 But I wondered what I was really seeing. 但是我想知道我真正看到的是什么。
So I played around with the nice features provided by ipython/jupyter: 因此,我试用了ipython / jupyter提供的出色功能:
diabetes.DESCR
Diabetes dataset
================
Notes
-----
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of
n = 442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
Data Set Characteristics:
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attributes:
:Age:
:Sex:
:Body mass index:
:Average blood pressure:
:S1:
:S2:
:S3:
:S4:
:S5:
:S6:
Note: Each of these 10 feature variables have been mean centered and scaled by the standard
deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
http://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004)
"Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)'
From the Source URL, we are led to the original raw data which is a tab-separated unnormalized copy of the data. 从源URL,我们将获得原始原始数据 ,该原始数据是制表符分隔的数据的非规范化副本。 It also further explains what the "S" features were in the problem domain. 它还进一步说明了问题域中的“ S”功能是什么。
But my real question is whether there is a way within sklearn
to determine 但是我真正的问题是sklearn
是否有一种方法可以确定
or is this just a demonstration of linear regression? 还是仅仅是线性回归的证明?
There is no way to denormalize data without any information about the data prior to the normalization. 在规范化之前,如果没有有关数据的任何信息,就无法对数据进行规范化。 However, note that the sklearn.preprocessing
classes MinMaxScaler
, StandardScaler
, etc. do include inverse_transform
methods ( example ), so if this were also provided in the example it would be easy to do. 但是,请注意sklearn.preprocessing
类MinMaxScaler
, StandardScaler
等确实包含inverse_transform
方法( 示例 ),因此,如果示例中也提供了此方法,则将很容易做到。 As it stands, as you say, this is just a regression demonstration. 就像您所说的那样,这只是回归演示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.