简体   繁体   English

适合Python中所有变量的模型(Scikit Learn)

[英]Fit model to all variables in Python (Scikit Learn)

This was asked about a different package elsewhere, but is there a way in Scikit Learn to include all variables or all variables minus some specified number like in R? 有人在其他地方询问过不同的包,但Scikit Learn中是否有一种方法可以包含所有变量或所有变量减去R中的某些指定数字?

To give an example of what I mean, say I have a regression y = x1 + x2 + x3 + x4. 举一个我的意思的例子,说我有一个回归y = x1 + x2 + x3 + x4。 In RI can evaluate this regression by running: 在RI中可以通过运行来评估此回归:

result = lm(y ~ ., data=DF)
summary(result)

I would have to imagine there's a similar way to condense formulas in Python since writing out all the variables for larger data sets would be kind of silly. 我不得不想象在Python中压缩公式有类似的方法,因为为更大的数据集写出所有变量会有点愚蠢。

is there a way in Scikit Learn to include all variables or all variables minus some specified number? 在Scikit Learn中有一种方法可以包含所有变量或所有变量减去一些指定的数字吗?

Yes, with sklearn + pandas, to fit using all variables except one, and use that one as the label, you can do simply 是的,使用sklearn + pandas,使用除一个之外的所有变量,并使用那个作为标签,你可以简单地做

model.fit(df.drop('y', axis=1), df['y'])

And this would work for most sklearn models. 这适用于大多数sklearn模型。

This would be the pandas + sklearn equivalent of R's ~ and - notation, if not using pasty . 如果不使用pasty ,这将是pandas + sklearn相当于R的~-符号。

To exclude multiple variables, you can do 要排除多个变量,您可以这样做

df.drop(['v1', 'v2'], axis=1)

We may try the following workaround (let's use iris dataset and the label species as numeric and fit a linear regression model to see how to use all the independent predictors both in R and python sklearn ): 我们可以尝试以下解决方法(让我们使用iris数据集和标签species作为数字并拟合线性回归模型,以了解如何使用Rpython sklearn所有独立预测变量):

In R 在R

summary(lm(as.numeric(Species)~., iris))[c('coefficients', 'r.squared')]

$coefficients
                Estimate Std. Error   t value     Pr(>|t|)
(Intercept)   1.18649525 0.20484104  5.792273 4.150495e-08
Sepal.Length -0.11190585 0.05764674 -1.941235 5.416918e-02
Sepal.Width  -0.04007949 0.05968881 -0.671474 5.029869e-01
Petal.Length  0.22864503 0.05685036  4.021874 9.255215e-05
Petal.Width   0.60925205 0.09445750  6.450013 1.564180e-09

$r.squared
[1] 0.9303939

In Python (sklearn with pasty) 在Python中(sklearn with pasty)

from sklearn.datasets import load_iris
import pandas as pd
from patsy import dmatrices

iris = load_iris()
names = [f_name.replace(" ", "_").strip("_(cm)") for f_name in iris.feature_names]
iris_df = pd.DataFrame(iris.data, columns=names)
iris_df['species'] = iris.target

# pasty does not support '.' at least in windows python 2.7, so here is the workaround 
y, X = dmatrices('species ~ ' + '+'.join(iris_df.columns - ['species']),
                  iris_df, return_type="dataframe")

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)

print model.score(X,y)
# 0.930422367533

print model.intercept_, model.coef_
# [ 0.19208399] [[0.22700138  0.60989412 -0.10974146 -0.04424045]]

As we can see the models learnt in R and Python with pasty are similar (the order of the coefficients are different). 正如我们所看到的,在RPython使用pasty学习的模型是相似的(系数的顺序是不同的)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 并行安装 scikit-learn model? - Fit a scikit-learn model in parallel? 多元回归 model 使用 scikit 在 python 中学习 - Multiple regression model using scikit learn in python 当我尝试为scikit-learn模型增加1个功能时,出现此错误“ ValueError:找到的输入变量样本数量不一致” - When I try to fit scikit-learn model with 1 more feature, I have this error “ValueError: Found input variables with inconsistent numbers of samples” scikit学习clf.fit /得分模型的准确性 - scikit learn clf.fit / score model accuracy 是否可以在循环中或使用迭代器中拟合() scikit-learn model - Is it possible to fit() a scikit-learn model in a loop or with an iterator 高斯混合模型_ Scikit学习_如何适合单个D数据? - Gaussain mixture Model _ Scikit Learn _ How to fit for single D data? 为什么在 scikit-learn 中没有定义 `model.fit`? - Why isn't `model.fit` defined in scikit-learn? Scikit学习:拟合优度的度量,更好地分割数据集或使用所有数据集? - Scikit learn: measure of goodness of fit, better splitting the dataset or use all of it? 使用 python 中的 scikit-learn 为每个组拟合逻辑回归 - Fit a logistic regression for each group using scikit-learn in python Python Scikit-Learn DecisionTreeClassifier.fit() 抛出 KeyError: 'default' - Python Scikit-Learn DecisionTreeClassifier.fit() throws KeyError: 'default'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM