![](/img/trans.png)
[英]How I can divide values in a column by specific row in R using dplyr?
[英]How I can divide values in a column by specific row in R?
這是我的大數據的一個子集:
gene feature reads
A anot 2
A 3ss_A 3
A 3ss_B 5
B 5ss_A 1
B anot 4
C 3ss_A 2
C 3ss_B 8
C anot 3
C 5ss_A 6
我想將每個基因中對應於3ss和5ss特征的讀數划分為該基因的“anot”特征。 我有每個基因的多個特征(這里沒有顯示),但每個基因只有一個“anot”特征。
預期產量是:
gene feature reads ratio
A anot 2 1
A 3ss_A 3 1.5
A 3ss_B 5 2.5
B 5ss_A 1 0.25
B anot 4 1
C 3ss_A 2 0.666666667
C 3ss_B 8 2.666666667
C anot 3 1
C 5ss_A 6 2
我怎么能在R中做到這一點? 謝謝
以下是各種替代方案:
1)ave使用像這樣的ave
。 函數fun
傳遞給一個基因的行號向量,並返回它的比率向量。 沒有使用包裹。
fun <- function(ix) with(DF[ix, ], reads / reads[feature == "anot"])
transform(DF, ratio = ave(1:nrow(DF), gene, FUN = fun))
贈送:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
1a)ave這是另一種使用ave
。 它用NA替換每個非anot讀數,然后在每個基因中使用na.omit
將讀數除以非NA:
transform(DF, ratio =
reads / ave(ifelse(feature == "anot", reads, NA), gene, FUN = na.omit))
贈送:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
1b)ave這是另一個ave
變種。 這一點特別簡潔,但確實假設anot
的reads
值始終是非負的(在問題的示例中就是這種情況)。 它會創建一個等於reads
anot
的向量,否則為零,然后取最大值:
transform(DF, ratio = reads / ave((feature == "anot") * reads, gene, FUN = max))
贈送:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
2)通過一種替代,也沒有使用任何包,是使用by
。 這里函數funby
采用DF
的一部分行並返回附加比率的子集。
funby <- function(x) transform(x, ratio = reads / reads[feature == "anot"])
do.call("rbind", by(DF, DF$gene, funby))
贈送:
gene feature reads ratio
A.1 A anot 2 1.0000000
A.2 A 3ss_A 3 1.5000000
A.3 A 3ss_B 5 2.5000000
B.4 B 5ss_A 1 0.2500000
B.5 B anot 4 1.0000000
C.6 C 3ss_A 2 0.6666667
C.7 C 3ss_B 8 2.6666667
C.8 C anot 3 1.0000000
C.9 C 5ss_A 6 2.0000000
3)rep / table這也不使用包。 它假定DF
按基因排序(在問題的例子中就是這種情況)。 它重復每個anot
讀取在該基因的行數,然后除以reads
由。
transform(DF, ratio = reads / rep(reads[feature == "anot"], table(gene)))
贈送:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
4)dplyr使用dplyr包:
library(dplyr)
DF %>%
group_by(gene) %>%
mutate(ratio = reads / reads[feature == "anot"]) %>%
ungroup()
贈送:
Source: local data frame [9 x 4]
gene feature reads ratio
(fctr) (fctr) (int) (dbl)
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
5)data.table使用data.table包:
library(data.table)
DT <- as.data.table(DF)
DT[, ratio := reads / reads[feature == "anot"], by = "gene"]
贈送:
> DT
gene feature reads ratio
1: A anot 2 1.0000000
2: A 3ss_A 3 1.5000000
3: A 3ss_B 5 2.5000000
4: B 5ss_A 1 0.2500000
5: B anot 4 1.0000000
6: C 3ss_A 2 0.6666667
7: C 3ss_B 8 2.6666667
8: C anot 3 1.0000000
9: C 5ss_A 6 2.0000000
注意:可重復形式的輸入DF
是:
Lines <- "gene feature reads
A anot 2
A 3ss_A 3
A 3ss_B 5
B 5ss_A 1
B anot 4
C 3ss_A 2
C 3ss_B 8
C anot 3
C 5ss_A 6"
DF <- read.table(text = Lines, header = TRUE)
你可以試試像
anot_reads <- yourdata[yourdata$feature == "anot",]$reads
names(anot_reads) <- yourdata[yourdata$feature == "anot",]$gene
yourdata$ratio <- yourdata$reads / anot_reads[yourdata$gene]
您可以在基礎R中使用:
df$ratio <- unlist(sapply(levels(df$gene),
function(l) with(subset(df, gene==l), reads / reads[feature=="anot"])))
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
它翻譯為:沿gene
水平應用:子集df,將reads
除以feature==anot
的reads
值。 然后unlist
結果並在data.frame
創建一個新列。
但可能有一個較短的選擇。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.