简体   繁体   English

如何使用rpy2通过for循环测试重要性?

[英]How to use rpy2 to test significance using a for loop?

I am attempting to run a t-test using r (with the help of the rpy2 package), on some variables from a pandas dataframe. 我试图在熊猫数据框中的某些变量上使用r(在rpy2软件包的帮助下)运行t检验。 I am using magic functions in jupyter notebook to get python to interact with R. The interaction is successful, except for the loop. 我正在jupyter笔记本中使用魔术函数来使python与R交互。交互成功,除了循环。

Here is the dataframe: 这是数据帧:

df.head()
Out[60]: 
              ID Category  Num Vert_Horizon Description  Fem_Valence_Mean  \
0  Animals_001_h  Animals    1            h  Dead Stork              2.40   
1  Animals_002_v  Animals    2            v        Lion              6.31   
2  Animals_003_h  Animals    3            h       Snake              5.14   
3  Animals_004_v  Animals    4            v        Wolf              4.55   
4  Animals_005_h  Animals    5            h         Bat              5.29   

   Fem_Valence_SD  Fem_Av/Ap_Mean  Fem_Av/Ap_SD  Arousal_Mean       ...        \
0            1.30            3.03          1.47          6.72       ...         
1            2.19            5.96          2.24          6.69       ...         
2            1.19            5.14          1.75          5.34       ...         
3            1.87            4.82          2.27          6.84       ...         
4            1.56            4.61          1.81          5.50       ...         

   Luminance  Contrast  JPEG_size80   LABL   LABA   LABB  Entropy  \
0     126.05     68.45       263028  51.75  -0.39  16.93     7.86   
1     123.41     32.34       250208  52.39  10.63  30.30     6.71   
2     135.28     59.92       190887  55.45   0.25   4.41     7.83   
3     122.15     75.10       282350  49.84   3.82   1.36     7.69   
4     131.81     59.77       329325  54.26  -0.34  -0.95     7.82   

   Classification  valence_median_split  temp_selection  
0                           Low_Valence             OUT  
1                          High_Valence             NaN  
2                           Low_Valence             OUT  
3                           Low_Valence             OUT  
4                           Low_Valence             OUT  

[5 rows x 35 columns]

Here is how I attempted to do it: 这是我尝试执行的操作:

%Rpush df

Variables = 'All_Valence_Mean', 'Male_Valence_Mean', 'Fem_Valence_Mean'

for var in Variables:
    %R var + '_Sig' <- t.test(var ~ valence_median_split, data = df, var.equal = TRUE)

I am attempting to get the results to be saved to the 'var' variable with a "Sig" string added. 我正在尝试将结果保存到添加了“ Sig”字符串的“ var”变量中。 This component is not crucial, but what I'm really after is getting this command to recognize "var" as a variable in a list of variables. 这个组件不是至关重要的,但是我真正要的是让该命令将“ var”识别为变量列表中的变量。

Here is the error that I got: 这是我得到的错误:

Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'

Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'

Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'
/anaconda3/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'

  warnings.warn(x, RRuntimeWarning)

If you are more comfortable with R, push as much of the logical as you can to R. For example, this will store the results in results that you will be able to access from Python in subsequent notebook cells. 如果您更喜欢R,则将尽可能多的逻辑推到R。例如,这会将结果存储在结果中,您可以在后续的笔记本单元中从Python访问results

%%R -i df -o results

Variables <- c("All_Valence_Mean", "Male_Valence_Mean",
               "Fem_Valence_Mean")
results <- list()

for (var in Variables) {
    results[[paste0(var, '_Sig')]] <- t.test(
        as.formula(paste(var, '~ valence_median_split')),
        data = df, var.equal = TRUE)
}

If you are more comfortable with Python, keep as much as you can in Python: 如果您更熟悉Python,请尽可能多地使用Python:

Variables = ('All_Valence_Mean', 'Male_Valence_Mean',
             'Fem_Valence_Mean')
results = dict()
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
stats = importr('stats')

for var in Variables:
    results[('%s_Sig' % var] = stats.t_test(
        Formula('%s ~ valence_median_split' % var),
        data=df, var_equal=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM