使用循环计算列的累积和

Question

I have a dataframe with gene expression data by lane (column).我有一个 dataframe，其中包含泳道（列）的基因表达数据。 What I would like to do is write a loop that takes the sum of each row but progressively adds in another column each time.我想做的是编写一个循环，它获取每一行的总和，但每次都逐渐添加到另一列中。 So each time I loop through I add another column to my dataframe that contains the sums of each row plus another column to the end of the dataframe. In the example below I did this using the apply() function by hand but this is very inefficient and not feasible for a large data set.因此，每次循环时，我都会在 dataframe 中添加另一列，其中包含每行的总和以及 dataframe 末尾的另一列。在下面的示例中，我手动使用 apply() function 执行此操作，但效率非常低并且对于大数据集不可行。 I messed around with the cumsum() function but couldn't seem to get it to work for this.我搞砸了 cumsum() function 但似乎无法让它为此工作。 Very possible I missed something obvious but any guidance would be great!很可能我错过了一些明显的东西，但任何指导都会很棒！

#Example dataframe #例子 dataframe

c1 <- c('G1', 'G2', 'G3')
c2 <- c(5, 3, 1)
c3 <- c(3, 7, 1)
c4 <- c(6, 3, 4)
c5 <- c(6, 4, 3)
df <- data.frame(c1, c2, c3, c4, c5)

#Cal cumulative sums
sum.2.3 <- apply(df[,2:3],1,sum)
sum.2.4 <- apply(df[,2:4],1,sum)
sum.2.5 <- apply(df[,2:5],1,sum)

df <- cbind(df, sum.2.3, sum.2.4, sum.2.5)

Answer 1

If the problem is the loop, you use apply inside it.如果问题出在循环上，则在其中使用 apply。

Code代码

start_col <- 2

end_col <- ncol(df)

for(i in (start_col+1):end_col){
  
  var_name <- paste("sum",start_col,i,sep = ".")
  
  df[,var_name] <- apply(df[,start_col:i],1,sum)
  
}

Output Output

  c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1  5  3  6  6       8      14      20
2 G2  3  7  3  4      10      13      17
3 G3  1  1  4  3       2       6       9

Answer 2

You can use Reduce()你可以使用Reduce()

Reduce(`+`, df[-1], accumulate = TRUE)[-1]

[[1]]
[1]  8 10  2

[[2]]
[1] 14 13  6

[[3]]
[1] 20 17  9

Assign into the data frame:分配到数据框中：

df[paste0("sum.2.", 3:5)] <-  Reduce(`+`, df[-1], accumulate = TRUE)[-1]

Gives:给出：

  c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1  5  3  6  6       8      14      20
2 G2  3  7  3  4      10      13      17
3 G3  1  1  4  3       2       6       9

Answer 3

No loop needed.不需要循环。

df <- data.frame(
    c1 = c('G1', 'G2', 'G3'),
    c2 = c(5, 3, 1),
    c3 = c(3, 7, 1),
    c4 = c(6, 3, 4),
    c5 = c(6, 4, 3))

cbind(df, setNames(as.data.frame(t(apply(df[,-1], 1, cumsum))[,-1]), paste0("sum.2.", 3:5)))

#>   c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
#> 1 G1  5  3  6  6       8      14      20
#> 2 G2  3  7  3  4      10      13      17
#> 3 G3  1  1  4  3       2       6       9

Answer 4

Using rowCumsums from matrixStats使用来自rowCumsums的matrixStats

library(matrixStats)
df[paste0("sum.2.", 3:5)] <- rowCumsums(as.matrix(df[2:5]))[,-1]

-output -输出

> df
  c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1  5  3  6  6       8      14      20
2 G2  3  7  3  4      10      13      17
3 G3  1  1  4  3       2       6       9

Answer 5

You can use both the mutate function from the dplyr package and the rowSums base function.您可以使用来自dplyr package 的mutate function 和rowSums基数 function。

library(dplyr)

c1 <- c('G1', 'G2', 'G3')
c2 <- c(5, 3, 1)
c3 <- c(3, 7, 1)
c4 <- c(6, 3, 4)
c5 <- c(6, 4, 3)
df <- data.frame(c1, c2, c3, c4, c5)

df <- df %>% 
  dplyr::mutate(sum.2.3 = rowSums(across(c2:c3)),
                sum.2.4 = rowSums(across(c2:c4)),
                sum.2.5 = rowSums(across(c2:c5)))

Result结果

  c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5
1 G1  5  3  6  6       8      14      20
2 G2  3  7  3  4      10      13      17
3 G3  1  1  4  3       2       6       9

使用循环计算列的累积和

问题描述

5 个解决方案

解决方案1
2 2023-01-18 00:52:03

Code代码

Output Output

解决方案2
2 2023-01-18 01:07:48

解决方案3
1 2023-01-18 01:12:09

解决方案4
1 2023-01-18 03:21:26

解决方案5
-1 2023-01-18 00:56:08

使用循环计算列的累积和

问题描述

5 个解决方案

解决方案1 2 2023-01-18 00:52:03

Code代码

Output Output

解决方案2 2 2023-01-18 01:07:48

解决方案3 1 2023-01-18 01:12:09

解决方案4 1 2023-01-18 03:21:26

解决方案5 -1 2023-01-18 00:56:08

解决方案1
2 2023-01-18 00:52:03

解决方案2
2 2023-01-18 01:07:48

解决方案3
1 2023-01-18 01:12:09

解决方案4
1 2023-01-18 03:21:26

解决方案5
-1 2023-01-18 00:56:08