简体   繁体   English

从JAVA调用R得到卡方统计和p值

[英]Call R from JAVA to get Chi-squared statistic and p-value

I have two 4*4 matrices in JAVA, where one matrix holds observed counts and the other expected counts. 我在JAVA中有两个4 * 4矩阵,其中一个矩阵包含观察计数和其他预期计数。

I need an automated way to calculate the p-value from the chi-square statistic between these two matrices; 我需要一种自动的方法来计算这两个矩阵之间的卡方统计量的p值; however, JAVA has no such function as far as I am aware. 但是,就我所知,JAVA没有这样的功能。

I can calculate the chi-square and its p-value by reading the two matrices into R as .csv file formats, and then using the chisq.test function as follows: 我可以通过将两个矩阵读成R作为.csv文件格式,然后使用chisq.test函数计算卡方和p值,如下所示:

obs<-read.csv("obs.csv")
exp<-read.csv("exp.csv")
chisq.test(obs,exp)

where the format of the .csv files would as follows: 其中.csv文件的格式如下:

A, C, G, T
A, 197.136, 124.32, 63.492, 59.052
C, 124.32, 78.4, 40.04, 37.24
G, 63.492, 40.04, 20.449, 19.019
T, 59.052, 37.24, 19.019, 17.689

Given these commands, R will give an output of the format: 给定这些命令,R将给出格式的输出:

X-squared = 20.6236, df = 9, p-value = 0.01443

which includes the p-value I was looking for. 其中包括我正在寻找的p值。

Does anyone know of an efficient way to automate the process of: 有谁知道自动化过程的有效方法:

1) Outputting my matrices from JAVA into .csv files 2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R 4) Returning the outputted p-value back into JAVA? 1)将我的矩阵从JAVA输出到.csv文件中2)将.csv文件上传到R 3)将.csv文件上的chisq.test调用到R中4)将输出的p值返回到JAVA?

Thanks for any help.... 谢谢你的帮助....

There are (at least) two ways of going about this. 有(至少)两种方式来解决这个问题。


Command Line & Scripts 命令行和脚本

You can execute Rscripts from the command line with Rscript.exe . 您可以使用Rscript.exe从命令行执行Rscript.exe Eg in your script you would have: 例如,在您的脚本中,您将拥有:

# Parse arguments.
# ...
# ...

chisq.test(obs, exp)

Rather than creating CSVs in Java and having R read them, you should be able to pass them straight to R. I don't see the need to create CSVs and pass data that way, UNLESS your matrices are quite big. 您应该能够直接将它们传递给R而不是用Java创建CSV并让R读取它们。我不认为需要创建CSV并以这种方式传递数据,除非您的矩阵非常大。 There are limitations on the size of command line arguments you can pass (varies across operating system I think). 您可以传递的命令行参数的大小存在限制(我认为操作系统不同)。

You can pass arguments into Rscripts and parse them using the commandArgs() functions or with various packages (eg optparse or getopt ). 您可以将参数传递给Rscripts并使用commandArgs()函数或使用各种包(例如optparsegetopt )解析它们。 See this thread for more information . 有关更多信息,请参阅此主题

There are several ways of calling and reading from the command line in Java. 在Java中有几种从命令行调用和读取的方法。 I don't know enough about it to give you advice but a bit of googling will give you a result. 我不太了解它给你的建议,但一些谷歌搜索将给你一个结果。 Calling a script from the command line is done like this: 从命令行调用脚本是这样的:

Rscript my_script.R

JRI JRI

JRI lets you talk to R straight from Java. JRI允许您直接从Java与R交谈。 Here's an example of how you would pass a double array to R and have R sum it (this is Java now): 下面是一个如何将双数组传递给R并将R求和的示例(现在是Java):

// Start R session.
Rengine re = new Rengine (new String [] {"--vanilla"}, false, null);

// Check if the session is working.
if (!re.waitForR()) {
    return;
}

re.assign("x", new double[] {1.5, 2.5, 3.5});
REXP result = re.eval("(sum(x))");
System.out.println(result.asDouble());
re.end();

The function assign() here is the same as doing this in R: 这里的函数assign()与在R中执行此操作相同:

x <- c(1.5, 2.5, 3.5)

You should be able to work out how to extend this to work with a matrix. 您应该能够找出如何扩展它以使用矩阵。


I think JRI is quite difficult at the beginning. 我认为JRI在开始时非常困难。 So if you want to get this done quickly the command line option is probably best. 因此,如果您希望快速完成此操作,命令行选项可能是最佳选择。 I would say the JRI approach is less messy once you get it set up though. 我会说,一旦你设置它,JRI方法就不那么混乱了。 And if you have situations where you have a lot of back and forth between R and Java it is definitely better than calling multiple scripts. 如果你有在R和Java之间有很多来回的情况,它肯定比调用多个脚本更好。

  1. Link to JRI . 链接到JRI
  2. Recommended Eclipse plugin to set up JRI . 推荐的Eclipse插件来设置JRI

Check this page JRI 查看此页面JRI

Description form their site: 描述自己的网站:

JRI is a Java/R Interface, which allows to run R inside Java applications as a single thread. JRI是一个Java / R接口,它允许在Java应用程序中作为单个线程运行R. Basically it loads R dynamic library into Java and provides a Java API to R functionality. 基本上它将R动态库加载到Java中,并为R功能提供Java API。 It supports both simple calls to R functions and a full running REPL. 它支持对R函数的简单调用和完整运行的REPL。

RCaller 2.2 can do what you want to do. RCaller 2.2可以做你想做的事。 Suppose the frequency matrix is given as in your question. 假设频率矩阵在您的问题中给出。 The resulted p.value and df variables can be calculated and returned using the code below: 可以使用以下代码计算和返回生成的p.value和df变量:

double[][] data = new double[][]{
        {197.136, 124.32, 63.492, 59.052},
        {124.32, 78.4, 40.04, 37.24},
        {63.492, 40.04, 20.449, 19.019},
        {59.052, 37.24, 19.019, 17.689}
        };
    RCaller caller = new RCaller();
    Globals.detect_current_rscript();
    caller.setRscriptExecutable(Globals.Rscript_current);
    RCode code = new RCode();

    code.addDoubleMatrix("mydata", data);
    code.addRCode("result <- chisq.test(mydata)");
    code.addRCode("mylist <- list(pval = result$p.value, df=result$parameter)");

    caller.setRCode(code);
    caller.runAndReturnResult("mylist");

    double pvalue = caller.getParser().getAsDoubleArray("pval")[0];
    double df = caller.getParser().getAsDoubleArray("df")[0];
    System.out.println("Pvalue is : "+pvalue);
    System.out.println("Df is : "+df);

The output is: 输出是:

Pvalue is : 1.0
Df is : 9.0

You can get the technical details in here 您可以在此处获取技术细节

Rserve is another way to get your data from Java to R and back. Rserve是另一种将数据从Java传输到R并返回的方法。 It is a server which takes R scripts as string inputs. 它是一个服务器,它将R脚本作为字符串输入。 You can use some string parsing and conversion in Java to convert the matrices into strings that can be input into R. 您可以在Java中使用一些字符串解析和转换将矩阵转换为可以输入到R的字符串。

import org.rosuda.REngine.REXP;
import org.rosuda.REngine.Rserve.RConnection;


public class RtestScript {

private String emailTestScript = "open <- c('O', 'O', 'N', 'N', 'O', 'O', 'N', 'N', 'N', 'O', " +
        " 'O', 'N', 'N', 'O', 'O', 'N', 'N', 'N', 'O');" +
        "testgroup <- c('A', 'A', 'A','A','A','A','A','A','A','A', 'B'," +
        "'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B');" +
        "emailTest <- data.frame(open, testgroup);" +
        "emailTable<- table(emailTest$open, emailTest$testgroup);" +
        "emailResults<- prop.test(emailTable, correct=FALSE);" +
        "print(emailResults$p.value);";

public void executeRscript() {
    try {
        //Make sure to type in library(Rserve); Rserve() in Rstudio before running this
        RConnection testConnection = new RConnection();

        REXP testExpression = testConnection.eval(emailTestScript);
        System.out.println("P value: " + testExpression.asString());
    } catch(Exception e) {
        e.printStackTrace();
    }
}
}

Here is some more information on Rserve. 以下是有关Rserve的更多信息。 Incidentally, this is also how Tableau can communicate with R as well with their R connection. 顺便提一下,这也是Tableau如何与R通信以及它们的R连接。

https://cran.r-project.org/web/packages/Rserve/index.html https://cran.r-project.org/web/packages/Rserve/index.html

1) Outputting my matrices from JAVA into .csv files 1)将我的矩阵从JAVA输出到.csv文件

Use any of CSV libraies, I would recommend http://opencsv.sourceforge.net/ 使用任何CSV图书馆,我建议http://opencsv.sourceforge.net/

2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R 2)将.csv文件上传到R 3)将.csv文件上的chisq.test调用到R中

2 & 3 a pretty the same, You better create parametrized script to be run in R. 2和3非常相似,你最好创建参数化脚本以在R中运行。

obs<-read.csv(args[1])
exp<-read.csv(args[2])
chisq.test(obs,exp)

So you can run 所以你可以跑

RScript your_script.r path_to_csv1 path_to_csv2, 

and use unique names for the csv files for example: 并使用csv文件的唯一名称,例如:

UUID.randomUUID().toString().replace("-","")

And then you use 然后你用

Runtime.getRuntime().exec(command, environments, dataDir);

4) Returning the outputted p-value back into JAVA? 4)将输出的p值返回JAVA? You can only read the output of R if you are using getRuntime().exec() to invoke R. 如果使用getRuntime()。exec()来调用R,则只能读取R的输出。

I would also recommend to take a look at Apache's Statistics Lib & How to calculate PValue from ChiSquare . 我还建议看一下Apache的Statistics Lib如何从ChiSquare计算PValue Maybe you can live without R at all :) 也许你可以在没有R的情况下生活:)

I recommend to simply use a Java library that does a ChiSquare test for you. 我建议您只使用一个为您进行ChiSquare测试的Java库。 There are enough of them: 有足够的:

This is not a complete list, but what I found in 5 minutes searching. 这不是一个完整的列表,但我在5分钟的搜索中找到了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM