[英]How to apply a function in each column of a data frame?
I have the following data frame with 345 rows and 237 columns in R: 我在R中有345行和237列的以下数据框:
snp1 snp2 snp3 ... snp237
0 1 2 ... 0
0 1 1 ... 1
1 1 2 ... 2
1 0 0 ... 0
... ... ... ...
2 2 1 ... 0
I want to apply the following function in each column: 我想在每列中应用以下功能:
D=(number of 0)/(number of rows)
H=(number of 1)/(number of rows)
R=(number of 2)/(number of rows)
p=D+(0.5*H)
q=R+(0.5*H)
Lastly, I want to store the "p" and "q" for each snp in a vector. 最后,我想将每个snp的“ p”和“ q”存储在向量中。 This function have calculate "p" and "q" for each snp in a single command of R. It is possible? 此函数在R的单个命令中为每个snp计算“ p”和“ q”。可能吗?
The output is: 输出为:
snp1 snp2 snp3 ... snp237
p1 p2 p3 ... ... p237
q1 q2 q3 ... ... q237
Thanks in advance. 提前致谢。
#DATA
set.seed(42)
d = data.frame(snp1 = sample(0:2, 10, TRUE),
snp2 = sample(0:2, 10, TRUE),
snp3 = sample(0:2, 10, TRUE))
#Function
foo = function(x){
len = length(x)
D = sum(x == 0)/len
H = sum(x == 1)/len
R = sum(x == 2)/len
p = D + 0.5 * H
q = R + 0.5 * H
return(c(p = p, q = q))
}
#Run foo for each column
sapply(d, foo)
# snp1 snp2 snp3
#p 0.35 0.4 0.35
#q 0.65 0.6 0.65
Here is an option with tidyverse
. 这是tidyverse
一个选项。 Create a function ( f1
) based on the logic in OP's code to return a list
of length 2, then use that in summarise_all
to apply the function on each of the columns of dataset 根据OP代码中的逻辑创建一个函数( f1
),以返回长度为2的list
,然后在summarise_all
中使用该函数将函数应用于数据集的每一列
library(dplyr)
library(tidyr)
f1 <- function(x) {
H <- 0.5 * mean(x == 1)
list(list(p = mean(x == 0) + H,
q = mean(x == 2) + H))
}
df1 %>%
summarise_all(f1) %>%
unnest
# snp1 snp2 snp3
#1 0.75 0.625 0.375
#2 0.25 0.375 0.625
df1 <- structure(list(snp1 = c(0L, 0L, 1L, 1L), snp2 = c(1L, 1L, 1L,
0L), snp3 = c(2L, 1L, 2L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.