简体   繁体   English

F_sklearn.feature_selection的回归

[英]F_Regression from sklearn.feature_selection

I found the F_regression technique for feature selection in the sklearn feature selection module . 我在sklearn特征选择模块中找到了用于特征选择的F_regression技术。 I was not able to understand the principle it uses . 我无法理解其使用的原理。 The description given was - 给出的说明是-

Univariate linear regression tests. 单变量线性回归测试。
Quick linear model for testing the effect of a single regressor, sequentially for many regressors. 快速线性模型,用于依次测试单个回归变量对多个回归变量的影响。 This is done in 3 steps: 这通过3个步骤完成:

    1.The regressor of interest and the data are orthogonalized wrt constant regressors. 1.感兴趣的回归变量和数据是与常数回归变量正交的。
    2. The cross correlation between data and regressors is computed. 2.计算数据与回归变量之间的互相关。
    3. It is converted to an F score then to a p-value. 3.将其转换为F分数,然后转换为p值。

I am not able to understand this , please can someone explain this in layman terms. 我无法理解,请有人用外行的方式解释一下。

The language in the docs is a little obtuse. 文档中的语言有些晦涩。 I believe 'data' refers to the response. 我相信“数据”指的是回应。 First, the chosen regressor and the response are orthogonalized with respect to the rest of the regressors. 首先,所选回归变量和响应相对于其余回归变量正交。 This reduces any multicollinearity that may be present. 这减少了可能存在的任何多重共线性。 Then, the correlation between the chosen regressor and the response is calculated. 然后,计算所选回归变量与响应之间的相关性。 In a univariate setting, the correlation coefficient is the square root of R^2, which can be written in terms of the F-statistic used in testing the overall significance of a model (see also this: https://stats.stackexchange.com/questions/56881/whats-the-relationship-between-r2-and-f-test ). 在单变量设置中,相关系数是R ^ 2的平方根,可以用检验模型整体重要性的F统计量来表示(另请参见: https://stats.stackexchange。 com / questions / 56881 / r2-和-f-test之间的关系 So next, the correlation is converted to an F-statistic, the corresponding p-value is calculated, and F and p are returned. 因此,接下来,将相关性转换为F统计量,计算相应的p值,并返回F和p。 If there is more than one regressor, this is done for all regressors one at a time. 如果存在多个回归器,则一次对所有回归器执行一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 自动功能选择-Sklearn.feature_selection - Automatic feature selection - Sklearn.feature_selection 在 sklearn.feature_selection 之后过滤 DataFrame - Filter DataFrame after sklearn.feature_selection Python-Sklearn管道SVC f_regression-获取列名 - Python - sklearn pipeline SVC f_regression - get column names sklearn.feature_selection中chi2的“ ValueError:长度必须匹配才能进行比较” - “ValueError: Lengths must match to compare” for chi2 from sklearn.feature_selection 如何解释sklearn.feature_selection中多类的_coeffs输出的特征重要性? - How to interpret importance of features from _coeffs outputs for multi-class in sklearn.feature_selection? sklearn logisitc回归中的特征选择 - Feature selection from sklearn logisitc regression sklearn.feature_selection chi2 为不同的标签识别相同的一元和二元 - sklearn.feature_selection chi2 identifies same unigrams and bigrams for different labels Scikit-learn(Python):f_regression()计算什么? - Scikit-learn (Python): what does f_regression() compute? sklearn.feature_selection.mutual_info_regression 未找到 - sklearn.feature_selection.mutual_info_regression not found 使用Scipy稀疏阵列的F回归特征选择 - F-Regression Feature Selection Using Scipy Sparse Arrays
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM