[英]Calculate the difference betwen pairs of consecutive rows in a data frame - R
我有一個data.frame,其中每個基因名稱都重復,並包含2個條件的值:
df <- data.frame(gene=c("A","A","B","B","C","C"),
condition=c("control","treatment","control","treatment","control","treatment"),
count=c(10, 2, 5, 8, 5, 1),
sd=c(1, 0.2, 0.1, 2, 0.8, 0.1))
gene condition count sd
1 A control 10 1.0
2 A treatment 2 0.2
3 B control 5 0.1
4 B treatment 8 2.0
5 C control 5 0.8
6 C treatment 1 0.1
我想計算治療后“計數”是否增加或減少,並將它們標記為和/或將它們分組。 那是(偽代碼):
for each unique(gene) do
if df[geneRow1,3]-df[geneRow2,3] > 0 then gene is "up"
else gene is "down"
它最終應該是什么樣子(最后一列是可選的):
up-regulated
gene condition count sd regulation
B control 5 0.1 up
B treatment 8 2.0 up
down-regulated
gene condition count sd regulation
A control 10 1.0 down
A treatment 2 0.2 down
C control 5 0.8 down
C treatment 1 0.1 down
我一直在用這種方式嘲笑我,包括玩ddply,我找不到解決方案 - 請一個倒霉的生物學家。
干杯。
plyr
解決方案看起來像:
library(plyr)
reg.fun <- function(x) {
reg.diff <- x$count[x$condition=='control'] - x$count[x$condition=='treatment']
x$regulation <- ifelse(reg.diff > 0, 'up', 'down')
x
}
ddply(df, .(gene), reg.fun)
gene condition count sd regulation
1 A control 10 1.0 up
2 A treatment 2 0.2 up
3 B control 5 0.1 down
4 B treatment 8 2.0 down
5 C control 5 0.8 up
6 C treatment 1 0.1 up
>
您還可以考慮使用不同的包和/或使用不同形狀的數據執行此操作:
df.w <- reshape(df, direction='wide', idvar='gene', timevar='condition')
library(data.table)
DT <- data.table(df.w, key='gene')
DT[, regulation:=ifelse(count.control-count.treatment > 0, 'up', 'down'), by=gene]
gene count.control sd.control count.treatment sd.treatment regulation
1: A 10 1.0 2 0.2 up
2: B 5 0.1 8 2.0 down
3: C 5 0.8 1 0.1 up
>
像這樣的東西:
df$up.down <- with( df, ave(count, gene,
FUN=function(diffs) c("up", "down")[1+(diff(diffs) < 0) ]) )
spltdf <- split(df, df$up.down)
> df
gene condition count sd up.down
1 A control 10 1.0 down
2 A treatment 2 0.2 down
3 B control 5 0.1 up
4 B treatment 8 2.0 up
5 C control 5 0.8 down
6 C treatment 1 0.1 down
> spltdf
$down
gene condition count sd up.down
1 A control 10 1.0 down
2 A treatment 2 0.2 down
5 C control 5 0.8 down
6 C treatment 1 0.1 down
$up
gene condition count sd up.down
3 B control 5 0.1 up
4 B treatment 8 2.0 up
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.