简体   繁体   English

独立性测试中的 Rfast 分段错误

[英]Rfast segmentation fault on independence test

I am having troubles using the G2-test function of the Fast function in R since it outputs a segmentation fault even though it seems to me that the input parameters are correct.我在使用 R 中的快速 function 的 G2 测试 function 时遇到问题,因为它输出正确的分段错误,即使在我看来输入参数是正确的。

More specifically, I am able to run the example code in the manual page更具体地说,我可以运行手册页中的示例代码

nvalues <- 3
nvars <- 10
nsamples <- 5000
data <- matrix( sample( 0:(nvalues - 1), nvars * nsamples, replace = TRUE ), nsamples, nvars )
dc <- rep(nvalues, nvars)

res<-g2Test( data, 1, 2, 3, c(3, 3, 3) )

But I'm not able to make it run on my data.但我无法让它在我的数据上运行。 The function g2Test takes as input a matrix of numbers, three integer that stands for the column on which to condition (in the example we are studying the dependence of the first on the second conditioned on the third) and a vector with the number of unique values per column. function g2Test 将一个数字矩阵作为输入,三个 integer 代表要条件的列(在示例中,我们正在研究第一个对第二个条件的依赖于第三个)和一个具有唯一数量的向量每列的值。

My code follows the same principles reading data from the ALARM csv file我的代码遵循从ALARM csv 文件中读取数据的相同原则

library(readr)
library(Rfast)

# open the file
path <-  "datasets/alarm.csv"
dataset <- read.csv(path)
# search for the indexes of the column I'm interested in and the amount of unique values per column
c1 <- "PVS"
c2 <- "ACO2"
s <- c("VALV", "VLNG", "VTUB",   "VMCH")
n <- colnames(dataset) 
col_c1 <- match(c1, n)
col_c2 <- match(c2, n)
cols_c3 <- c()
uni <- c(length(unique(dataset[c1])[[1]])[[1]],length(unique(dataset[c2])[[1]])[[1]])
if (!s[1]=="()"){
 for(v in s){
   idx <- match(v, n)
   cols_c3 <- append(cols_c3,idx)
   uni <- append(uni,length(unique(dataset[v])[[1]])[[1]])
 }
}
# transforming the str DataFrame into a integer matrix
for (nn in n){
  dataset[nn] <- unclass(as.factor(dataset[nn][[1]]))
}
ds <- as.matrix(dataset)
colnames(ds) <- NULL

# running the G2 test
res <- g2Test(ds, col_c1, col_c2, cols_c3, uni)

But it results into a segmentation fault但这会导致分段错误

 *** caught segfault ***
address 0x1f103f96a, cause 'memory not mapped'

Traceback:
 1: g2Test(ds, col_c1, col_c2, cols_c3, uni)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

The same happens if I condition on just one variable and not on multiple ones.如果我只以一个变量而不是多个变量为条件,也会发生同样的情况。

I really don't understand why this happens since it seems to me that my case is the same as the example on the reference, just with different data.我真的不明白为什么会发生这种情况,因为在我看来,我的情况与参考文献中的示例相同,只是数据不同。 I would really appreciate any help for debugging this issue, please tell me if I need to specify further infos.我非常感谢调试此问题的任何帮助,如果我需要指定更多信息,请告诉我。

First, I'm sorry that I missed that you had originally included your data!首先,很抱歉我错过了您最初包含您的数据!

Alright, I wish I would have realized this sooner (as you will, as well...).好吧,我希望我能早点意识到这一点(你也会……)。 The columns have to be consecutive and the values must start at zero.列必须是连续的,并且值必须从零开始。 So what does that mean?那是什么意思? You have to rearrange the columns so that col_c1 is the first column, col_c2 is the second column, and so on.您必须重新排列列,以便col_c1是第一列, col_c2是第二列,依此类推。 You have to subtract all values by one (since the lowest value is 1).您必须将所有值减一(因为最小值是 1)。

This is what I did (and how I checked it):这就是我所做的(以及我如何检查它):

# there was no PVS, I assume this was PVSAT
c1 <- "PVSAT"
# c1 <- "PVS"

# there was no ACO2, I assume this was ARTCO2
c2 <- "ARTCO2"
# c2 <- "ACO2"

# there are no columns with these names...
# for VALV - VENTALV; for VLNG - VENTLUNG; for VTUB - VENTTUBE; for VMCH - VENTMACH
s <- c("VENTALV", "VENTLUNG", "VENTTUBE", "VENTMACH")
# s <- c("VALV", "VLNG", "VTUB", "VMCH")

This next chunk is exactly as you wrote it:下一个块与您编写的完全一样:

n <- colnames(dataset) 

col_c1 <- match(c1, n)
col_c2 <- match(c2, n)

cols_c3 <- c()

uni <- c(length(unique(dataset[c1])[[1]])[[1]],length(unique(dataset[c2])[[1]])[[1]])

if (!s[1]=="()"){
  for(v in s){
    idx <- match(v, n)
    cols_c3 <- append(cols_c3,idx)
    uni <- append(uni,length(unique(dataset[v])[[1]])[[1]])
  }
}
# transforming the str DataFrame into a integer matrix
for (nn in n){
  dataset[nn] <- unclass(as.factor(dataset[nn][[1]]))
}

ds <- as.matrix(dataset)

This is where I made the minimum zero:这是我将最小值设为零的地方:

# look at the number of unique values before changing, as a means of validation
sapply(1:ncol(ds), function(x) length(unique(ds[, x])))
# look at the minimum, as a means of validation
sapply(1:ncol(ds), function(x) min(ds[,x]))
# the minimum value must be zero
ds <- ds - 1
# check
sapply(1:ncol(ds), function(x) min(ds[,x]))
sapply(1:ncol(ds), function(x) length(unique(ds[, x])))

# looked as expected

Next, I rearranged the columns.接下来,我重新排列了列。 I did this before removing the names so I could use the names to ensure the order was correct.我在删除名称之前这样做了,所以我可以使用这些名称来确保顺序正确。

# the data must be consecutive numbers
# catch names before and after
n2 <- dimnames(ds)
# some of the results from this:
# [[2]]
#  [1] "HISTORY"      "CVP"          "PCWP"         "HYPOVOLEMIA"

# create the list of column indicies other than those getting called in g2Test
tellMe <- c(1:ncol(ds))
tellMe <- tellMe[-c(col_c1, col_c2, sort(cols_c3))] 

# rearrange using the indices
ds <- ds[, c(col_c1, col_c2, sort(cols_c3), tellMe)]

# check it
(n3 <- dimnames(ds))
# some of the results from this
# [[2]]
#  [1] "PVSAT"        "ARTCO2"       "VENTMACH"     "VENTTUBE"

All that's left is removing the names (just as you did) and then calling the function.剩下的就是删除名称(就像您所做的那样),然后调用 function。 Since the indices changed, your objects won't work here, though.但是,由于索引发生了变化,您的对象将无法在这里工作。

colnames(ds) <- NULL

# running the G2 test
# res <- g2Test(ds, col_c1, col_c2, sort(cols_c3), uni)
res2 <- g2Test(ds, 1, 2, c(3,4,5,6), c(3, 3, 4, 4, 4, 4))
# $statistic
# [1] 19.78506
# 
# $df
# [1] 1024
#  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM