[英]How to use rpy2 to test significance using a for loop?
I am attempting to run a t-test using r (with the help of the rpy2 package), on some variables from a pandas dataframe. 我试图在熊猫数据框中的某些变量上使用r(在rpy2软件包的帮助下)运行t检验。 I am using magic functions in jupyter notebook to get python to interact with R. The interaction is successful, except for the loop.
我正在jupyter笔记本中使用魔术函数来使python与R交互。交互成功,除了循环。
Here is the dataframe: 这是数据帧:
df.head()
Out[60]:
ID Category Num Vert_Horizon Description Fem_Valence_Mean \
0 Animals_001_h Animals 1 h Dead Stork 2.40
1 Animals_002_v Animals 2 v Lion 6.31
2 Animals_003_h Animals 3 h Snake 5.14
3 Animals_004_v Animals 4 v Wolf 4.55
4 Animals_005_h Animals 5 h Bat 5.29
Fem_Valence_SD Fem_Av/Ap_Mean Fem_Av/Ap_SD Arousal_Mean ... \
0 1.30 3.03 1.47 6.72 ...
1 2.19 5.96 2.24 6.69 ...
2 1.19 5.14 1.75 5.34 ...
3 1.87 4.82 2.27 6.84 ...
4 1.56 4.61 1.81 5.50 ...
Luminance Contrast JPEG_size80 LABL LABA LABB Entropy \
0 126.05 68.45 263028 51.75 -0.39 16.93 7.86
1 123.41 32.34 250208 52.39 10.63 30.30 6.71
2 135.28 59.92 190887 55.45 0.25 4.41 7.83
3 122.15 75.10 282350 49.84 3.82 1.36 7.69
4 131.81 59.77 329325 54.26 -0.34 -0.95 7.82
Classification valence_median_split temp_selection
0 Low_Valence OUT
1 High_Valence NaN
2 Low_Valence OUT
3 Low_Valence OUT
4 Low_Valence OUT
[5 rows x 35 columns]
Here is how I attempted to do it: 这是我尝试执行的操作:
%Rpush df
Variables = 'All_Valence_Mean', 'Male_Valence_Mean', 'Fem_Valence_Mean'
for var in Variables:
%R var + '_Sig' <- t.test(var ~ valence_median_split, data = df, var.equal = TRUE)
I am attempting to get the results to be saved to the 'var' variable with a "Sig" string added. 我正在尝试将结果保存到添加了“ Sig”字符串的“ var”变量中。 This component is not crucial, but what I'm really after is getting this command to recognize "var" as a variable in a list of variables.
这个组件不是至关重要的,但是我真正要的是让该命令将“ var”识别为变量列表中的变量。
Here is the error that I got: 这是我得到的错误:
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
/anaconda3/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in model.frame.default(formula = var ~ valence_median_split, data = df) :
invalid type (list) for variable 'var'
warnings.warn(x, RRuntimeWarning)
If you are more comfortable with R, push as much of the logical as you can to R. For example, this will store the results in results
that you will be able to access from Python in subsequent notebook cells. 如果您更喜欢R,则将尽可能多的逻辑推到R。例如,这会将结果存储在结果中,您可以在后续的笔记本单元中从Python访问
results
。
%%R -i df -o results
Variables <- c("All_Valence_Mean", "Male_Valence_Mean",
"Fem_Valence_Mean")
results <- list()
for (var in Variables) {
results[[paste0(var, '_Sig')]] <- t.test(
as.formula(paste(var, '~ valence_median_split')),
data = df, var.equal = TRUE)
}
If you are more comfortable with Python, keep as much as you can in Python: 如果您更熟悉Python,请尽可能多地使用Python:
Variables = ('All_Valence_Mean', 'Male_Valence_Mean',
'Fem_Valence_Mean')
results = dict()
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
stats = importr('stats')
for var in Variables:
results[('%s_Sig' % var] = stats.t_test(
Formula('%s ~ valence_median_split' % var),
data=df, var_equal=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.