Suppose I have an array named sims_add_dom
with dim(100,5,100):
sims_add_dom <- structure(list(marker = 1:10, coeff_a = c(0.1814993012, -1.2206119381,
-0.298198096, 0.1131342646, 1.2563355045, 0.7464163985, 0.0002634054,
0.1154037559, 0.3739666234, 1.8235592343), Pvalue_a = c(0.7449502,
0.001649993, 0.4299404, 0.7704995, 0.07119358, 0.1737651, 0.9996618,
0.7814851, 0.5222457, 1.616549e-05), coeff_d = c(-2.36629627,
2.54339395, 0.16246537, -0.14700687, -0.82243816, 0.9682112,
NA, -0.55876864, -2.18497032, -4.78780087), Pvalue_d = c(0.3925707,
0.00146736, 0.820999, 0.8413498, 0.7667223, 0.7268808, NA, 0.3931673,
0.2660354, 2.889129e-06)), class = "data.frame", row.names = c(NA,
-10L))
Now I want to select the rows based on certain conditions on the variables Pvalue_a
and Pvalue_d
. Suppose, if the value of the Pvalue_a < 0.05
or Pvalue_d < 0.05
, then select these rows and their associated values.
Actually, I want to know how many estimators are significant based on the condition. I have searched on google and StackOverflow but did not find a straightforward answer to my question.
I would be very grateful if someone helps me to solve this problem. Thank you for your help.
Example dataset:
Return rows that meet condition:
> sims_add_dom[which(sims_add_dom$Pvalue_a < 0.05 | sims_add_dom$Pvalue_d < 0.05), c("Pvalue_a", "Pvalue_d")]
Pvalue_a Pvalue_d
2 1.649993e-03 1.467360e-03
10 1.616549e-05 2.889129e-06
Count rows that meet condition
> sum(sims_add_dom$Pvalue_a < 0.05 | sims_add_dom$Pvalue_d < 0.05, na.rm = TRUE)
[1] 2
Here is another solution using dplyr
package. First get the rows that meet the condition
library(dplyr)
sims_add_dom %>%
filter(Pvalue_a < 0.05 | Pvalue_d < 0.05)
Then count the number of rows that meet the condition
#Count how many rows meet the condition
sims_add_dom %>%
filter(Pvalue_a < 0.05 | Pvalue_d < 0.05) %>%
count()
Note that for you to get an array from replicate, the output should be a matrix, and this is what you showed in the screenshot, but not in dput().
Something to simulate data:
func = function(){
a = rnorm(50)
d = rnorm(50)
markers = matrix(as.numeric(runif(50*100)>0.5),
nrow=50)
res = sapply(1:ncol(markers),function(i){
fit = lm(cbind(a,d)~markers[,i])
res = do.call(rbind,coefficients(summary(fit)))
c(i,res[2,1],res[2,4],res[4,1],res[4,4])
})
res = t(res)
colnames(res) = c("marker","coeff_a","Pvalue_a","coeff_d","Pvalue_d")
return(res)
}
sims_add_dom = replicate(10,func())
I did only 10 reps, but the structure is similar:
dim(sims_add_dom)
[1] 100 5 10
Now to get those p < 0.05 for a or d:
sig = lapply(seq(dim(sims_add_dom)[3]), function(x){
M = sims_add_dom[ , , x]
M[M[,"Pvalue_d"]<0.05 | M[,"Pvalue_d"]<0.05, ]
})
head(sig[[1]])
marker coeff_a Pvalue_a coeff_d Pvalue_d
[1,] 7 -0.1579199 0.6422984 -0.6462950 0.01672552
[2,] 9 -0.0648256 0.8474612 0.6091641 0.02316872
[3,] 17 0.3098400 0.3558238 -0.5721621 0.03352941
[4,] 54 0.3766042 0.2591446 0.5370216 0.04593391
[5,] 77 -0.2054801 0.5413273 0.5974129 0.02611847
To get the number:
sapply(sig,nrow)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.