繁体   English   中英

如何在数据帧的每一行上应用 function?

[英]How to apply a function on every row of a data frame?

我想用这个 package 元数据计算多个 o 值

我的数据框有 3 个 p 值

    > dput(head(tt))
structure(list(RS = c("rs2089177", "rs4360974", "rs6502526", 
"rs8069906", "rs9905280", "rs4313843"), G = c(0.9986, 0.9738, 
0.9744, 0.7184, 0.7205, 0.9804), E = c(0.7153, 0.7838, 0.7839, 
0.4918, 0.4861, 0.8522), B = c(0.604716, 0.430228, 0.42916, 0.521452, 
0.465758, 0.474313)), class = c("data.table", "data.frame"), row.names  = c(NA, 
-6L), .internal.selfref = <pointer: 0x10200eee0>)

和来自 tt 数据帧的每个 p 值的具有相应权重的数据帧

   > dput(head(df))
structure(list(wg = c(40.6324993078201, 40.6324993078201, 40.6324993078201, 
 40.6324993078201, 40.6324993078201, 40.6324993078201), we = c(35.3977400408557, 
35.3977400408557, 35.3977400408557, 35.3977400408557, 35.3977400408557, 
35.3977400408557), wb = c(580.643608420863, 580.643608420863, 
580.643608420863, 580.643608420863, 580.643608420863, 580.643608420863
), RS = c("rs2089177", "rs4360974", "rs6502526", "rs8069906", 
"rs9905280", "rs4313843")), row.names = c(NA, 6L), class = "data.frame")

df 和 tt 中的 RS 列相同

如何使用这个 sunz() function 创建一个新的数据框,它看起来与 tt 相同,只是它会有额外的列,比如名为“META”,它计算了每一行的元 p 值

这是第一行中 p 值是多少的示例:

 > sumz(c(0.9986,0.7153,0.604716), weights = c(40.6325,35.39774,580.6436), na.action = na.fail)
p =  0.6940048

这是我指的function: https://www.rdocumentation.org/packages/metap/versions/1.1/topics/sumz

我尝试合并这两个数据帧并在每一行上应用 function:

> head(q)
       ID         P         G       E       wb      wg       we
1:  rs1029830 0.0979931 0.0054060 0.39160 580.6436 40.6325 35.39774
2:  rs1029832 0.1501820 0.0028140 0.39320 580.6436 40.6325 35.39774
3: rs11078374 0.1701250 0.0009805 0.49730 580.6436 40.6325 35.39774
4:  rs1124961 0.1710150 0.7252000 0.05737 580.6436 40.6325 35.39774
5:  rs1135237 0.1493650 0.6851000 0.06354 580.6436 40.6325 35.39774
6: rs11867934 0.0757972 0.0006140 0.00327 580.6436 40.6325 35.39774


helper <- function(x) {
   p <- sumz(x[2:4], weights = x[5:7])$p
   p
}

q$META <- apply(q, MARGIN = 1, helper)

但我收到此错误:

 Error in sumz(x[2:4], weights = x[5:7]) : 
  Must have at least two valid p values 

首先,既然您说RS在两者之间是相同的,那么对我来说这听起来很谨慎, “我们有多确定行总是正确排列?” 为了防御,我会说“不是 100%”,并将它们加入/合并在一起,以便保证它们以正确的顺序排列。

quux <- tt[df, on="RS"]
quux
#           RS      G      E        B      wg       we       wb
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436

从这里开始,它只是将行的每个部分与同一行的其他部分应用于每一行:

quux$META <- sapply(seq_len(nrow(quux)), function(rn) {
  unlist(sumz(as.matrix(quux[,.(G,E,B)])[rn,], weights = as.vector(quux[,.(wg,we,wb)])[rn,],
              na.action=na.fail)["p"])
})
quux
#           RS      G      E        B      wg       we       wb      META
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436 0.9863582
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436 0.9294546
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436 0.9300445
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436 0.6379392
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436 0.6055061
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436 0.9605584

或者以data.table为中心的方式:

mysumz <- function(x, w) sumz(unlist(x), weights = unlist(w), na.action = na.fail)[["p"]]
quux[, META := mysumz(.(G,E,B), .(wg,we,wb)), by = seq_len(nrow(quux))]

(借用https://stackoverflow.com/a/36802640 )。 二级 function 是必需的,因为对mysumz的每次调用都有一个xwlist ,但sumz需要向量。 如果你想验证这一点,首先调用debugonce(mysumz)然后运行quux[,META:=...]并检查xw ... 以及它是如何工作的。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM