Python 中的稳健 2-Way ANOVA

Question

I need to run robust ANOVA from Python.我需要从 Python 运行稳健的方差分析。 The function I want to use is t2way from R package WRS2.我要使用的t2way是来自 R package WRS2 的 t2way。 I tried with r2py, but I'm stuck with an error:我尝试使用 r2py，但我遇到了一个错误：

>>> import rpy2.robjects.packages as rpackages
>>> from rpy2.robjects import pandas2ri
>>> pandas2ri.activate()
>>> df = pd.read_csv("https://github.com/lawrence009/dsur/raw/master/data/goggles.csv")
>>> rdf = pandas2ri.py2rpy(df)
>>> WRS2 = rpackages.importr('WRS2')
>>> WRS2.t2way("attractiveness ~ gender*alcohol", data = rdf)

RRuntimeError: Error in x[[grp[i]]] : 
  attempt to select less than one element in get1index

I'm looking for either a way to make this work with rpy2, or (even better) a port of WRS2 to the python environment.我正在寻找一种方法来使这项工作与 rpy2 一起工作，或者（甚至更好）将 WRS2 端口连接到 python 环境。 Any help would be much appreciated.任何帮助将非常感激。

Answer 1

here is my particular solution for this problem.这是我对这个问题的特殊解决方案。 At the very beginnig the first problem in R is that when you import the data frame you have to change the type of the column alcohol and gender as.factor.一开始，R 中的第一个问题是，当您导入数据框时，您必须更改列酒精和性别 as.factor 的类型。

in R the script would be:在 R 中，脚本将是：

library(WRS2)
df <- read.csv2("https://github.com/lawrence009/dsur/raw/master/data/goggles.csv",header = TRUE, sep=',')
df[ , c('attractiveness')] <- as.numeric(df[ , c('attractiveness')])
df[ , c('alcohol')] <- as.factor(df[ , c('alcohol')])
df[ , c('gender')] <- as.factor(df[ , c('gender')])
t2way(attractiveness ~ gender*alcohol, data = df)

In python, although, I didn't find the way to change the data type of the column, but I came with this solution: First you have to create an.R file named my_t2way.R that contains:在 python 中，虽然我没有找到更改列数据类型的方法，但我提供了这个解决方案：首先你必须创建一个名为 my_t2way.ZE1E1D3D40573127E9EE0480C1 的.R 文件，其中包含：

my_t2way <- function(df1){
    library(WRS2)
    df <- read.csv2(df1,header = TRUE, sep=',')
    df[ , c('attractiveness')] <- as.numeric(df[ , c('attractiveness')])
    df[ , c('alcohol')] <- as.factor(df[ , c('alcohol')])
    df[ , c('gender')] <- as.factor(df[ , c('gender')])
    f <- t2way(attractiveness ~ gender*alcohol, data = df) 
    df1 = data.frame(factor=c('gender','alcohol','gender:alcohol'),
                     value = c(f$Qa,f$Qb,f$Qab),
                    p.value = c(f$A.p.value,f$B.p.value,f$AB.p.value))
    return(df1)
}

And then you can run the following commands from python然后您可以从 python 运行以下命令

import pandas as pd
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri# Defining the R script and loading the instance in Python
pandas2ri.activate()

r = robjects.r
r['source']('my_t2way.R')# Loading the function we have defined in R.
my_t2way_r = robjects.globalenv['my_t2way']# Reading and processing data
df1 = "https://github.com/lawrence009/dsur/raw/master/data/goggles.csv"
df_result_r = my_t2way_r(df1)

Certainly this solution only works for this particular case, but I think that could be easily extensible to other dataframes.当然，这个解决方案只适用于这种特殊情况，但我认为这可以很容易地扩展到其他数据帧。

Answer 2

If the issue is with columns in the dataframe that are not factors (as suggested in other answer), casting them into factors is quite easy:如果问题出在 dataframe 中的列不是因子（如其他答案中所建议），则将它们转换为因子非常容易：

rdf = pandas2ri.py2rpy(df)

base = importr('base')
import rpy2.robjects as ro

for cn in ('alcohol', 'gender'):
    i = rdf.colnames.index(cn)
    rdf[i] = base.as_factor(rdf[i])
    # We could also do it with
    # rdf[i] = ro.FactorVector(rdf[i])

To be on the safe side, it is recommended to create an R formula object.为了安全起见，建议创建一个 R 公式 object。 Some R functions will accept strings and assume that they are formula, but this is up to a package author and not always the case.一些 R 函数将接受字符串并假定它们是公式，但这取决于 package 作者，并非总是如此。

WRS2.t2way(ro.Formula('attractiveness ~ gender*alcohol'), data = rdf)

Python 中的稳健 2-Way ANOVA

问题描述

2 个解决方案

解决方案1
0 2021-04-02 21:52:39

解决方案2
0 已采纳 2021-04-03 16:00:30

Python 中的稳健 2-Way ANOVA

问题描述

2 个解决方案

解决方案1 0 2021-04-02 21:52:39

解决方案2 0 已采纳 2021-04-03 16:00:30

解决方案1
0 2021-04-02 21:52:39

解决方案2
0 已采纳 2021-04-03 16:00:30