简体   繁体   English

如何遍历 dataframe 并将列传递给 Python 中的 glm function?

[英]How to iterate through dataframe and pass columns to glm function in Python?

I have a dataframe with 7 variables:我有一个带有 7 个变量的 dataframe:

   RACA   pca   pp  pcx  psc     lp     csc
0     BARBUDA  1915  470  150  140  87.65   91.41
1     BARBUDA  1345  305  100  110  79.32   98.28
2     BARBUDA  1185  295   80   85  62.19   83.12
3     BARBUDA  1755  385  120  130  80.65   90.01
4     BARBUDA  1570  325  120  120  77.96   87.99
5    CANELUDA  1640  365  110  115  81.38   87.26
6    CANELUDA  1960  525  135  145  89.21   99.37
7    CANELUDA  1715  410  100  120  79.35   99.84
8    CANELUDA  1615  380  100  110  76.32   99.27
9    CANELUDA  2230  500  165  160  90.22   99.56
10   CANELUDA  1570  400  105   95  85.24   83.95
11  COMERCIAL  1815  380  145   90  73.32   92.81
12  COMERCIAL  2475  345  180  140  71.77  105.64
13  COMERCIAL  1870  295  125  125  72.36   97.89
14  COMERCIAL  2435  565  185  160  73.24  107.39
15  COMERCIAL  1705  315  115  125  72.03   96.11
16  COMERCIAL  2220  495  165  150  87.63   96.89
17     PELOCO  1145  250   75   85  50.57   77.90
18     PELOCO   705   85   55   50  38.26   78.09
19     PELOCO  1140  195   80   75  66.15   96.35
20     PELOCO  1355  250   90   95  50.60   91.39
21     PELOCO  1095  220   80   80  53.03   84.57
22     PELOCO  1580  255  125  120  59.30   95.57

I want to fit a glm for every dependent variable, pca:csc, in R it's quite simple to do it, but I don't know how to get this working on Python.我想在 R 中为每个因变量 pca:csc 安装一个 glm,这很简单,但我不知道如何在 Python 上使用它。 I tried to write a for loop and pass the column name to the formula but so far didn't work out:我尝试编写一个 for 循环并将列名传递给公式,但到目前为止还没有成功:

for column in df:
    col = str(column)
    model = sm.formula.glm(paste(col,"~ RACA"), data=df).fit()
    print(model.summary())

I am using Pandas and statsmodel我正在使用 Pandas 和 statsmodel

import pandas as pd
import statsmodels.api as sm

I imagine it must be so simple, but I sincerely couldn't figure it out yet.我想它一定很简单,但我真的想不通。

I was able to figure out a solution, don't know if it's the most efficient or elegant one, but give the results I wanted:我能够找到一个解决方案,不知道它是否是最有效或最优雅的解决方案,但给出我想要的结果:

for column in df.loc[:,'pca':'csc']:
    col = str(column)
    formula = col + "~RACA"
    model = sm.formula.glm(formula = formula, data=df).fit()
    print(model.summary())

I am open to suggestions on how I could improve this.我愿意接受有关如何改进这一点的建议。 Thank you!谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM