简体   繁体   English

在 R 中使用 for 循环根据前面的列条件组合列值

[英]Using for loop in R to combine column values based on preceeding column conditions

I am working with a big dataset with multiple observations for a certain gene, on varying dates and with varying expression levels.我正在处理一个大型数据集,该数据集在不同日期和不同表达水平上对某个基因进行多次观察。 Data used使用的数据

I would like to sum all the 'expression' column values if :如果出现以下情况,我想对所有“表达式”列求和:

  1. They belong to the same gene (column 'gene' i = column 'gene' i+1)它们属于同一个基因(列“基因”i = 列“基因”i+1)

AND

  1. They are measured on the same date (column 'date' i = column 'date' i+1)它们是在同一日期测量的(列“日期”i = 列“日期”i+1)

The output should be something like this (each gene should have 1 observation per date, ie the sum of all the expression levels of that gene on that date): The_desired_output输出应该是这样的(每个基因每个日期应该有 1 个观察值,即该基因在该日期的所有表达水平的总和): The_desired_output

I have tried making a for loop, but I am relatively new to R and having troubles with creating a dataframe out of the loop.我尝试过创建一个 for 循环,但我对 R 比较陌生,并且在创建循环外的数据框时遇到了麻烦。 An alternative solution might be better.替代解决方案可能会更好。

Thanks a lot!非常感谢!

How big is "big"? “大”有多大? If you really have a large dataset, you are much better off with data.table .如果你真的有一个数据集,你最好使用data.table

Here is an example with 10MM rows.这是一个 10MM 行的示例。

#   made up example: YOU should provide this
#
set.seed(1)    # for reproducible example
df <- data.frame(gene=sample(1:1e6, 1e7, replace=TRUE), 
                 expression=rpois(1e7, 5), 
                 date=sample(43000:44000, 1e7, replace=TRUE))
##
#
library(tictoc)       # for timing functions
library(dplyr)
library(data.table)
##
#
tic()
result.1 <- df %>% group_by(gene, date) %>% summarise(expression = sum(expression))
toc()
## 40.83 sec elapsed
##
#
tic()
result.2 <- setDT(df)[, .(expression=sum(expression)), keyby=.(gene, date)]
toc()
## 3.03 sec elapsed

So data.table is 13 times faster in this example.所以在这个例子中data.table快了 13 倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM