简体   繁体   English

Python 中的稳健 2-Way ANOVA

[英]Robust 2-Way ANOVA in Python

I need to run robust ANOVA from Python.我需要从 Python 运行稳健的方差分析。 The function I want to use is t2way from R package WRS2.我要使用的t2way是来自 R package WRS2 的 t2way。 I tried with r2py, but I'm stuck with an error:我尝试使用 r2py,但我遇到了一个错误:

>>> import rpy2.robjects.packages as rpackages
>>> from rpy2.robjects import pandas2ri
>>> pandas2ri.activate()
>>> df = pd.read_csv("https://github.com/lawrence009/dsur/raw/master/data/goggles.csv")
>>> rdf = pandas2ri.py2rpy(df)
>>> WRS2 = rpackages.importr('WRS2')
>>> WRS2.t2way("attractiveness ~ gender*alcohol", data = rdf)

RRuntimeError: Error in x[[grp[i]]] : 
  attempt to select less than one element in get1index

I'm looking for either a way to make this work with rpy2, or (even better) a port of WRS2 to the python environment.我正在寻找一种方法来使这项工作与 rpy2 一起工作,或者(甚至更好)将 WRS2 端口连接到 python 环境。 Any help would be much appreciated.任何帮助将非常感激。

here is my particular solution for this problem.这是我对这个问题的特殊解决方案。 At the very beginnig the first problem in R is that when you import the data frame you have to change the type of the column alcohol and gender as.factor.一开始,R 中的第一个问题是,当您导入数据框时,您必须更改列酒精和性别 as.factor 的类型。

in R the script would be:在 R 中,脚本将是:

library(WRS2)
df <- read.csv2("https://github.com/lawrence009/dsur/raw/master/data/goggles.csv",header = TRUE, sep=',')
df[ , c('attractiveness')] <- as.numeric(df[ , c('attractiveness')])
df[ , c('alcohol')] <- as.factor(df[ , c('alcohol')])
df[ , c('gender')] <- as.factor(df[ , c('gender')])
t2way(attractiveness ~ gender*alcohol, data = df)

In python, although, I didn't find the way to change the data type of the column, but I came with this solution: First you have to create an.R file named my_t2way.R that contains:在 python 中,虽然我没有找到更改列数据类型的方法,但我提供了这个解决方案:首先你必须创建一个名为 my_t2way.ZE1E1D3D40573127E9EE0480C1 的.R 文件,其中包含:

my_t2way <- function(df1){
    library(WRS2)
    df <- read.csv2(df1,header = TRUE, sep=',')
    df[ , c('attractiveness')] <- as.numeric(df[ , c('attractiveness')])
    df[ , c('alcohol')] <- as.factor(df[ , c('alcohol')])
    df[ , c('gender')] <- as.factor(df[ , c('gender')])
    f <- t2way(attractiveness ~ gender*alcohol, data = df) 
    df1 = data.frame(factor=c('gender','alcohol','gender:alcohol'),
                     value = c(f$Qa,f$Qb,f$Qab),
                    p.value = c(f$A.p.value,f$B.p.value,f$AB.p.value))
    return(df1)
}

And then you can run the following commands from python然后您可以从 python 运行以下命令

import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri# Defining the R script and loading the instance in Python
pandas2ri.activate()

r = robjects.r
r['source']('my_t2way.R')# Loading the function we have defined in R.
my_t2way_r = robjects.globalenv['my_t2way']# Reading and processing data
df1 = "https://github.com/lawrence009/dsur/raw/master/data/goggles.csv"
df_result_r = my_t2way_r(df1)

Certainly this solution only works for this particular case, but I think that could be easily extensible to other dataframes.当然,这个解决方案只适用于这种特殊情况,但我认为这可以很容易地扩展到其他数据帧。

If the issue is with columns in the dataframe that are not factors (as suggested in other answer), casting them into factors is quite easy:如果问题出在 dataframe 中的列不是因子(如其他答案中所建议),则将它们转换为因子非常容易:

rdf = pandas2ri.py2rpy(df)

base = importr('base')
import rpy2.robjects as ro

for cn in ('alcohol', 'gender'):
    i = rdf.colnames.index(cn)
    rdf[i] = base.as_factor(rdf[i])
    # We could also do it with
    # rdf[i] = ro.FactorVector(rdf[i])

To be on the safe side, it is recommended to create an R formula object.为了安全起见,建议创建一个 R 公式 object。 Some R functions will accept strings and assume that they are formula, but this is up to a package author and not always the case.一些 R 函数将接受字符串并假定它们是公式,但这取决于 package 作者,并非总是如此。

WRS2.t2way(ro.Formula('attractiveness ~ gender*alcohol'), data = rdf)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM