简体   繁体   中英

How to select rows from a arrays using R?

Suppose I have an array named sims_add_dom with dim(100,5,100):

sims_add_dom <- structure(list(marker = 1:10, coeff_a = c(0.1814993012, -1.2206119381, 
-0.298198096, 0.1131342646, 1.2563355045, 0.7464163985, 0.0002634054, 
0.1154037559, 0.3739666234, 1.8235592343), Pvalue_a = c(0.7449502, 
0.001649993, 0.4299404, 0.7704995, 0.07119358, 0.1737651, 0.9996618, 
0.7814851, 0.5222457, 1.616549e-05), coeff_d = c(-2.36629627, 
2.54339395, 0.16246537, -0.14700687, -0.82243816, 0.9682112, 
NA, -0.55876864, -2.18497032, -4.78780087), Pvalue_d = c(0.3925707, 
0.00146736, 0.820999, 0.8413498, 0.7667223, 0.7268808, NA, 0.3931673, 
0.2660354, 2.889129e-06)), class = "data.frame", row.names = c(NA, 
-10L))

Now I want to select the rows based on certain conditions on the variables Pvalue_a and Pvalue_d . Suppose, if the value of the Pvalue_a < 0.05 or Pvalue_d < 0.05 , then select these rows and their associated values.

Actually, I want to know how many estimators are significant based on the condition. I have searched on google and StackOverflow but did not find a straightforward answer to my question.

I would be very grateful if someone helps me to solve this problem. Thank you for your help.

Example dataset:

图片

Return rows that meet condition:

> sims_add_dom[which(sims_add_dom$Pvalue_a < 0.05 | sims_add_dom$Pvalue_d < 0.05), c("Pvalue_a", "Pvalue_d")]
       Pvalue_a     Pvalue_d
2  1.649993e-03 1.467360e-03
10 1.616549e-05 2.889129e-06

Count rows that meet condition

> sum(sims_add_dom$Pvalue_a < 0.05 | sims_add_dom$Pvalue_d < 0.05, na.rm = TRUE)
[1] 2

Here is another solution using dplyr package. First get the rows that meet the condition

library(dplyr)

sims_add_dom %>%
  filter(Pvalue_a < 0.05 | Pvalue_d < 0.05)

Then count the number of rows that meet the condition

#Count how many rows meet the condition
sims_add_dom %>%
  filter(Pvalue_a < 0.05 | Pvalue_d < 0.05) %>%
  count()

Note that for you to get an array from replicate, the output should be a matrix, and this is what you showed in the screenshot, but not in dput().

Something to simulate data:

func = function(){
  a = rnorm(50)
  d = rnorm(50)
  markers = matrix(as.numeric(runif(50*100)>0.5),
                   nrow=50)

  res = sapply(1:ncol(markers),function(i){
    fit = lm(cbind(a,d)~markers[,i])
    res = do.call(rbind,coefficients(summary(fit)))
    c(i,res[2,1],res[2,4],res[4,1],res[4,4])
  })
  res = t(res)
  colnames(res) = c("marker","coeff_a","Pvalue_a","coeff_d","Pvalue_d")
  return(res)
}

sims_add_dom = replicate(10,func())

I did only 10 reps, but the structure is similar:

dim(sims_add_dom)
[1] 100   5  10

Now to get those p < 0.05 for a or d:

sig = lapply(seq(dim(sims_add_dom)[3]), function(x){
  M = sims_add_dom[ , , x]
  M[M[,"Pvalue_d"]<0.05 | M[,"Pvalue_d"]<0.05, ]
})

head(sig[[1]])
     marker    coeff_a  Pvalue_a    coeff_d   Pvalue_d
[1,]      7 -0.1579199 0.6422984 -0.6462950 0.01672552
[2,]      9 -0.0648256 0.8474612  0.6091641 0.02316872
[3,]     17  0.3098400 0.3558238 -0.5721621 0.03352941
[4,]     54  0.3766042 0.2591446  0.5370216 0.04593391
[5,]     77 -0.2054801 0.5413273  0.5974129 0.02611847

To get the number:

sapply(sig,nrow)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM