简体   繁体   中英

Create chi square matrix in R

thank you for helping me. I know enough about R to be dangerous, but not enough to be good. I'm trying to create a chi square matrix (exactly like a correlation matrix, but with chi sq) from a dataframe. I've searched extensively (including this page: Chi-square p value matrix in r which didn't help) even trying a loop, and I'm not getting there. Here's what I have so far:

R Studio version 1.2.5033 on Windows 10

data - 14 columns of 3401 rows, I'm only using 9 columns for the chi sq, 8 variables are binary, 1 is categorical

I've gotten the chi sq results I need, but I want to combine it in a matrix. As it is a standard matrix where columns get progressively smaller, the columns are not the same length, so most bind commands won't work. I do have plyr, and I am trying to do it that way, but am failing. Here's the code:

CA <- c(1, 0, 0, 1, 1, 1, 0, 1, 0, 0)
Pos <- c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0)
Mon <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 1)
Sc <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 0)
ood <- c(1, 1, 1, 1, 0, 1, 1, 0, 0, 1)
Eco <- c(1, 2, 4, 6, 7, 3, 2, 5, 7, 7)
Orp <- c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0)
BC <- c(1, 1, 1, 0, 0, 0, 0, 1, 0, 1)
SA <- c(0, 0, 1, 1, 1, 1, 1, 0, 0, 1)
MV <- c(1, 1, 0, 1, 1, 0, 1, 1, 0, 0) 
Ad <- c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
YR <- c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0)
KC <- c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1)
H <- c(1, 1, 1, 1, 1, 1, 1, 0, 0 ,1)
data <- data.frame(CA, Pos, Mon, Sc, ood, Eco, Orp, BC, SA, MV, Ad, YR, KC, H)

Or, here is the dput() data:

structure(list(CA = c(0, 0, 1, 1, 1, 1, 0, 1, 0), Pos = c(0, 
0, 0, 0, 1, 1, 0, 1, 0), Mon = c(1, 0, 1, 1, 0, 1, 1, 1, 1), 
    Sc = c(0, NA, 0, 0, 1, 1, 1, 1, 0), ood = c(1, 1, 1, 0, 1, 
    0, 1, 1, 1), Eco = c(7, 6, 7, 0, 1, 6, 5, 1, 3), Orp = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0), BC = c(1, 0, 1, 1, 1, 1, 1, 1, 1
    ), SA = c(0, 1, 0, 0, 0, 0, 0, 0, 0), MV = c(15, 13, 16, 
    12, 16, 14, 11, 18, 12), Ad = c(2, 2, 2, 3, 3, 2, 2, 2, 2
    ), YR = c(2, 2, 1, 1, 2, 2, 2, 1, 2), KC = c(2, 2, 1, 1, 
    1, 1, 1, 1, 1), H = c(0, 1, 1, 0, 0, 1, 1, 0, 1)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L), spec = structure(list(
    cols = list(CA = structure(list(), class = c("collector_double", 
    "collector")), Pos = structure(list(), class = c("collector_double", 
    "collector")), Mon = structure(list(), class = c("collector_double", 
    "collector")), Sc = structure(list(), class = c("collector_double", 
    "collector")), ood = structure(list(), class = c("collector_double", 
    "collector")), Eco = structure(list(), class = c("collector_double", 
    "collector")), Orp = structure(list(), class = c("collector_double", 
    "collector")), BC = structure(list(), class = c("collector_double", 
    "collector")), SA = structure(list(), class = c("collector_double", 
    "collector")), MV = structure(list(), class = c("collector_double", 
    "collector")), Ad = structure(list(), class = c("collector_double", 
    "collector")), YR = structure(list(), class = c("collector_double", 
    "collector")), KC = structure(list(), class = c("collector_double", 
    "collector")), H = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1), class = "col_spec"))
library(tidyverse)
library(plyr)
library(reshape2)

chisqstat <- function(x, y){round(chisq.test(x, y)$statistic, 2)} #chi sq formula, pulling stat
chisqp <- function(x, y){round(chisq.test(x, y)$p.value, 4)} #chi sq formula pulling pvalue

chisqdata <- list(apply(data[c(2:7, 9, 14)], 2, chisqstat, y = data$CA), #chi sq statistic for all values with child abuse
                       apply(data[c(2:7, 9, 14)], 2, chisqp, y = data$CA), #chi sq pvalue for all values with child abuse
                       apply(data[c(3:7, 9, 14)], 2, chisqstat, y = data$Pos),
                       apply(data[c(3:7, 9, 14)], 2, chisqp, y = data$Pos),
                       apply(data[c(4:7, 9, 14)], 2, chisqstat, y = data$Mon),
                       apply(data[c(4:7, 9, 14)], 2, chisqp, y = data$Mon),
                       apply(data[c(5:7, 9, 14)], 2, chisqstat, y = data$Sc),
                       apply(data[c(5:7, 9, 14)], 2, chisqp, y = data$Sc),
                       apply(data[c(6:7, 9, 14)], 2, chisqstat, y = data$ood),
                       apply(data[c(6:7, 9, 14)], 2, chisqp, y = data$ood),
                       apply(data[c(7, 9, 14)], 2, chisqstat, y = data$Eco),
                       apply(data[c(7, 9, 14)], 2, chisqp, y = data$Eco),
                       apply(data[c(9, 14)], 2, chisqstat, y = data$Orp),
                       apply(data[c(9, 14)], 2, chisqp, y = data$Orp),
                       apply(data[c(14)], 2, chisqstat, y = data$SA),
                       apply(data[c(14)], 2, chisqp, y = data$SA))
do.call(rbind.fill.matrix(), chisqdata)

When I do each line as:

as.matrix(apply(data[c(2:7, 9, 14], 2, chisqstat, y = data$CA))

I get exactly what I want with row names so I'm wondering if I need to use as.matrix. I know I can do a full matrix where the info above and below the diagonal is the same, but I find that visually overwhelming. I want to get something like this (this is correlations not chi square):

       CA   Pos   Mon    Sc   ood   Eco
Pos  0.20                              
Mon -0.20  0.60                        
Sc   0.22 -0.22 -0.65                  
ood -0.22  0.22  0.65 -0.52            
Eco  0.00 -0.18 -0.18  0.38 -0.58      
Orp  0.41  0.00 -0.41  0.36  0.09  0.04

I'm sure I'm missing something obvious, so your help is greatly appreciated. Or maybe there is a much nicer way to do this in general, again, thank you.

It looks like your dput data added is a tibble , which is probably from how your data was read in:

> class(data)
[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

While your previous example was a plain data.frame (I called it df ).

This plays a role in how your columns are extracted in the chisqmatrix code from here . Specifically:

m[i,j] = chisq.test(x[,i],x[,j],)$p.value

With a simple data.frame , selecting the first column gives you a numeric vector, which is what you want:

> df[,1]
[1] 0 0 1 1 1 1 0 1 0
> class(df[,1])
[1] "numeric"

While with a tibble , you will get another tibble :

> data[,1]
# A tibble: 9 x 1
     CA
  <dbl>
1     0
2     0
3     1
4     1
5     1
6     1
7     0
8     1
9     0
> class(data[,1])
[1] "tbl_df"     "tbl"        "data.frame"

So, two ways to deal with this. One way is to use double brackets (eg, df[[1]] ) which will extract a numeric vector:

> data[[1]]
[1] 0 0 1 1 1 1 0 1 0

Or, you can add drop = TRUE to coerce to a vector (eg, df[,1, drop = TRUE] ):

> data[,1,drop=T]
[1] 0 0 1 1 1 1 0 1 0

Because with a tibble the [ defaults to drop = FALSE :

https://tibble.tidyverse.org/reference/subsetting.html

So, here is the same function to reproduce your desired result:

chisqmatrix <- function(x) {
  names = colnames(x);  num = length(names)
  m = matrix(nrow=num,ncol=num,dimnames=list(names,names))
  for (i in 1:(num-1)) {
    for (j in (i+1):num) {
      #browser()
      m[j,i] = chisq.test(x[, i, drop = TRUE],x[, j, drop = TRUE])$p.value
    }
  }
  return (m)
}

mat <- chisqmatrix(data[c("CA", "Pos", "Mon", "Sc", "ood", "Eco")])
mat[-1, -ncol(mat)]

           CA       Pos       Mon        Sc       ood
Pos 0.2356799        NA        NA        NA        NA
Mon 1.0000000 1.0000000        NA        NA        NA
Sc  1.0000000 0.1441270 1.0000000        NA        NA
ood 0.5303348 1.0000000 1.0000000 1.0000000        NA
Eco 0.4220127 0.2399069 0.6669878 0.1562356 0.2959326

Note: I removed "Orp" as this will give an error from the sample data where there is only a single level; so you can replace with:

chisqmatrix(data[c("CA", "Pos", "Mon", "Sc", "ood", "Eco", "Orp")])

if using your full data.

Please let me know if this works for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM