[英]Apply a function on all possible combination of columns in a dataframe in Python — Better way
What i am trying to do is to apply a linear regression using statsmodels.api for all possible pairwise columns combinations of a Dataframe. 我想做的是使用statsmodels.api对数据框的所有可能的成对列组合应用线性回归。
I was able to do it for the following code : 我能够做到以下代码:
For the dataframe df : 对于数据框df :
import statsmodels.api as sm
import numpy as np
import pandas as pd
#generate example Dataframe
df = pd.DataFrame(abs(np.random.randn(50, 4)*10), columns=list('ABCD'))
#extract all possible combinations of columns by column index number
i, j = np.tril_indices(df.shape[1], -1)
#generate a for loop that creates the variable an run the regression on each pairwise combination
for idx,item in enumerate(list(zip(i, j))):
exec("model" + str(idx) +" = sm.OLS(df.iloc[:,"+str(item[0])+"],df.iloc[:,"+str(item[1])+"])")
exec("regre_result" + str(idx) +" = model" + str(idx)+".fit()")
regre_result0.summary()
OLS Regression Results
Dep. Variable: B R-squared: 0.418
Model: OLS Adj. R-squared: 0.406
Method: Least Squares F-statistic: 35.17
Date: Tue, 09 Jan 2018 Prob (F-statistic): 3.00e-07
Time: 14:16:25 Log-Likelihood: -174.29
No. Observations: 50 AIC: 350.6
Df Residuals: 49 BIC: 352.5
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
A 0.7189 0.121 5.930 0.000 0.475 0.962
Omnibus: 14.290 Durbin-Watson: 1.828
Prob(Omnibus): 0.001 Jarque-Bera (JB): 16.289
Skew: 1.101 Prob(JB): 0.000290
Kurtosis: 4.722 Cond. No. 1.00
It works, but i imagine there is an easier way to achieve similar results, anybody can point me the best way to achieve it ? 它可以工作,但是我想有一种更简单的方法可以达到类似的结果,有人可以指出我实现这一目标的最佳方法吗?
Why are you doing it this way with exec and loads of variables instead of just appending to a list? 为什么用exec和变量加载而不是仅附加到列表来这样做呢?
You can also use itertools.combinations
to get all pairs of columns. 您还可以使用
itertools.combinations
获取所有成对的列。
Try something like this: 尝试这样的事情:
In [1]: import itertools
In [2]: import pandas as pd
In [3]: daf = pd.DataFrame(columns=list('ABCD'))
In [4]: list(itertools.combinations(daf.columns, 2))
Out[4]: [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
In [6]: col_pairs = list(itertools.combinations(daf.columns, 2))
In [6]: models = []
In [7]: results = []
In [8]: for a,b in col_pairs:
...: model = get_model(df[a],df[b])
...: models.append(model)
...: result = get_result(model)
...: results.append(result)
In [9]: results[0].summary()
Where get_model
will call sm.OLS
and get_result
will call fit
(or just call those here without putting them in external functions. But don't do it this crazy exec way - best practice is to avoid using it ). get_model
将在其中调用sm.OLS
而get_result
将在此处调用fit
(或者在不将其置于外部函数中的情况下仅在此处调用它们。但是不要以这种疯狂的exec方式进行操作- 最佳实践是避免使用它 )。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.