简体   繁体   English

在Python中将函数应用于数据框中所有可能的列组合-更好的方法

[英]Apply a function on all possible combination of columns in a dataframe in Python — Better way

What i am trying to do is to apply a linear regression using statsmodels.api for all possible pairwise columns combinations of a Dataframe. 我想做的是使用statsmodels.api对数据框的所有可能的成对列组合应用线性回归。

I was able to do it for the following code : 我能够做到以下代码:

For the dataframe df : 对于数据框df

import statsmodels.api as sm
import numpy as np
import pandas as pd

#generate example Dataframe
df = pd.DataFrame(abs(np.random.randn(50, 4)*10), columns=list('ABCD'))

#extract all possible combinations of columns by column index number
i, j = np.tril_indices(df.shape[1], -1)

#generate a for loop that creates the variable an run the regression on each pairwise combination
for idx,item in enumerate(list(zip(i, j))):
    exec("model" + str(idx) +" = sm.OLS(df.iloc[:,"+str(item[0])+"],df.iloc[:,"+str(item[1])+"])")
    exec("regre_result" + str(idx) +" = model" + str(idx)+".fit()")

regre_result0.summary()

OLS Regression Results
Dep. Variable:  B   R-squared:  0.418
Model:  OLS Adj. R-squared: 0.406
Method: Least Squares   F-statistic:    35.17
Date:   Tue, 09 Jan 2018    Prob (F-statistic): 3.00e-07
Time:   14:16:25    Log-Likelihood: -174.29
No. Observations:   50  AIC:    350.6
Df Residuals:   49  BIC:    352.5
Df Model:   1       
Covariance Type:    nonrobust       
coef    std err t   P>|t|   [0.025  0.975]
A   0.7189  0.121   5.930   0.000   0.475   0.962
Omnibus:    14.290  Durbin-Watson:  1.828
Prob(Omnibus):  0.001   Jarque-Bera (JB):   16.289
Skew:   1.101   Prob(JB):   0.000290
Kurtosis:   4.722   Cond. No.   1.00

It works, but i imagine there is an easier way to achieve similar results, anybody can point me the best way to achieve it ? 它可以工作,但是我想有一种更简单的方法可以达到类似的结果,有人可以指出我实现这一目标的最佳方法吗?

Why are you doing it this way with exec and loads of variables instead of just appending to a list? 为什么用exec和变量加载而不是仅附加到列表来这样做呢?

You can also use itertools.combinations to get all pairs of columns. 您还可以使用itertools.combinations获取所有成对的列。

Try something like this: 尝试这样的事情:

In [1]: import itertools
In [2]: import pandas as pd
In [3]: daf = pd.DataFrame(columns=list('ABCD'))
In [4]: list(itertools.combinations(daf.columns, 2))
Out[4]: [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
In [6]: col_pairs = list(itertools.combinations(daf.columns, 2))
In [6]: models = []
In [7]: results = []
In [8]: for a,b in col_pairs:
     ...:     model = get_model(df[a],df[b])
     ...:     models.append(model)
     ...:     result = get_result(model)
     ...:     results.append(result)
In [9]: results[0].summary()

Where get_model will call sm.OLS and get_result will call fit (or just call those here without putting them in external functions. But don't do it this crazy exec way - best practice is to avoid using it ). get_model将在其中调用sm.OLSget_result将在此处调用fit (或者在不将其置于外部函数中的情况下仅在此处调用它们。但是不要以这种疯狂的exec方式进行操作- 最佳实践是避免使用它 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM