[英]How to apply a function on every row of a data frame?
我想用这个 package 元数据计算多个 o 值
我的数据框有 3 个 p 值
> dput(head(tt))
structure(list(RS = c("rs2089177", "rs4360974", "rs6502526",
"rs8069906", "rs9905280", "rs4313843"), G = c(0.9986, 0.9738,
0.9744, 0.7184, 0.7205, 0.9804), E = c(0.7153, 0.7838, 0.7839,
0.4918, 0.4861, 0.8522), B = c(0.604716, 0.430228, 0.42916, 0.521452,
0.465758, 0.474313)), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x10200eee0>)
和来自 tt 数据帧的每个 p 值的具有相应权重的数据帧
> dput(head(df))
structure(list(wg = c(40.6324993078201, 40.6324993078201, 40.6324993078201,
40.6324993078201, 40.6324993078201, 40.6324993078201), we = c(35.3977400408557,
35.3977400408557, 35.3977400408557, 35.3977400408557, 35.3977400408557,
35.3977400408557), wb = c(580.643608420863, 580.643608420863,
580.643608420863, 580.643608420863, 580.643608420863, 580.643608420863
), RS = c("rs2089177", "rs4360974", "rs6502526", "rs8069906",
"rs9905280", "rs4313843")), row.names = c(NA, 6L), class = "data.frame")
df 和 tt 中的 RS 列相同
如何使用这个 sunz() function 创建一个新的数据框,它看起来与 tt 相同,只是它会有额外的列,比如名为“META”,它计算了每一行的元 p 值
这是第一行中 p 值是多少的示例:
> sumz(c(0.9986,0.7153,0.604716), weights = c(40.6325,35.39774,580.6436), na.action = na.fail)
p = 0.6940048
这是我指的function: https://www.rdocumentation.org/packages/metap/versions/1.1/topics/sumz
我尝试合并这两个数据帧并在每一行上应用 function:
> head(q)
ID P G E wb wg we
1: rs1029830 0.0979931 0.0054060 0.39160 580.6436 40.6325 35.39774
2: rs1029832 0.1501820 0.0028140 0.39320 580.6436 40.6325 35.39774
3: rs11078374 0.1701250 0.0009805 0.49730 580.6436 40.6325 35.39774
4: rs1124961 0.1710150 0.7252000 0.05737 580.6436 40.6325 35.39774
5: rs1135237 0.1493650 0.6851000 0.06354 580.6436 40.6325 35.39774
6: rs11867934 0.0757972 0.0006140 0.00327 580.6436 40.6325 35.39774
helper <- function(x) {
p <- sumz(x[2:4], weights = x[5:7])$p
p
}
q$META <- apply(q, MARGIN = 1, helper)
但我收到此错误:
Error in sumz(x[2:4], weights = x[5:7]) :
Must have at least two valid p values
首先,既然您说RS
在两者之间是相同的,那么对我来说这听起来很谨慎, “我们有多确定行总是正确排列?” 为了防御,我会说“不是 100%”,并将它们加入/合并在一起,以便保证它们以正确的顺序排列。
quux <- tt[df, on="RS"]
quux
# RS G E B wg we wb
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436
从这里开始,它只是将行的每个部分与同一行的其他部分应用于每一行:
quux$META <- sapply(seq_len(nrow(quux)), function(rn) {
unlist(sumz(as.matrix(quux[,.(G,E,B)])[rn,], weights = as.vector(quux[,.(wg,we,wb)])[rn,],
na.action=na.fail)["p"])
})
quux
# RS G E B wg we wb META
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436 0.9863582
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436 0.9294546
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436 0.9300445
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436 0.6379392
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436 0.6055061
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436 0.9605584
或者以data.table
为中心的方式:
mysumz <- function(x, w) sumz(unlist(x), weights = unlist(w), na.action = na.fail)[["p"]]
quux[, META := mysumz(.(G,E,B), .(wg,we,wb)), by = seq_len(nrow(quux))]
(借用https://stackoverflow.com/a/36802640 )。 二级 function 是必需的,因为对mysumz
的每次调用都有一个x
和w
的list
,但sumz
需要向量。 如果你想验证这一点,首先调用debugonce(mysumz)
然后运行quux[,META:=...]
并检查x
和w
... 以及它是如何工作的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.