循环遍历 r 中的行数据框并检查 if else 函数语句

Question

一个可重现的例子：

example <- structure(list(seqnames = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr1"), start = c(14660L, 661861L, 662360L, 700216L, 703359L, 
713320L), end = c(14736L, 661929L, 662414L, 700326L, 703430L, 
713395L), width = c(77L, 69L, 55L, 111L, 72L, 76L), strand = c("+", 
"+", "-", "-", "-", "-")), row.names = c(NA, -6L), class = "data.frame")

看起来像这样：

  seqnames  start    end width strand
1     chr1  14660  14736    77      +
2     chr1 661861 661929    69      +
3     chr1 662360 662414    55      -
4     chr1 700216 700326   111      -
5     chr1 703359 703430    72      -
6     chr1 713320 713395    76      -

我的函数查找start并添加100如果 start 具有+符号作为strand ，则-链的end减少了100 。

extension <- function(peak_df) {
  if(peak_df['strand']=='+'){
    peak_df['end'] = peak_df['start'] + 100
  }
  else if (peak_df['strand']=='-') {
    peak_df['start'] = peak_df['end'] - 100
  }
}

然后我想使用这个函数并循环遍历example的行。 如果我使用apply我会收到以下错误：

> apply(pk.df,1, extension)
Error in peak_df["end"] - 100 : non-numeric argument to binary operator

如果我使用sapply那么我会得到一个不同的错误：

> sapply(pk.df,extension)
Error in if (peak_df["strand"] == "+") { : 
  missing value where TRUE/FALSE needed

这是因为我的函数没有向量化吗？

Answer 1

使用 dplyr：

library(dplyr)

example %>% mutate(end = case_when(strand == '+' ~ start  + 100, TRUE ~ end+0), 
                   start = case_when(strand == '-' ~ end - 100, TRUE ~ start+0))
  seqnames  start    end width strand
1     chr1  14660  14760    77      +
2     chr1 661861 661961    69      +
3     chr1 662314 662414    55      -
4     chr1 700226 700326   111      -
5     chr1 703330 703430    72      -
6     chr1 713295 713395    76      -

Answer 2

检查str的apply ，

apply(example,1, function(x) {str(x)})

 Named chr [1:5] "chr1" " 14660" " 14736" " 77" "+"
 - attr(*, "names")= chr [1:5] "seqnames" "start" "end" "width" ...
 Named chr [1:5] "chr1" "661861" "661929" " 69" "+"
 - attr(*, "names")= chr [1:5] "seqnames" "start" "end" "width" ...
 Named chr [1:5] "chr1" "662360" "662414" " 55" "-"
 - attr(*, "names")= chr [1:5] "seqnames" "start" "end" "width" ...
 Named chr [1:5] "chr1" "700216" "700326" "111" "-"
 - attr(*, "names")= chr [1:5] "seqnames" "start" "end" "width" ...
 Named chr [1:5] "chr1" "703359" "703430" " 72" "-"
 - attr(*, "names")= chr [1:5] "seqnames" "start" "end" "width" ...
 Named chr [1:5] "chr1" "713320" "713395" " 76" "-"
 - attr(*, "names")= chr [1:5] "seqnames" "start" "end" "width" ...
NULL

这些值是apply中的character ，因此添加as.numeric 。

extension <- function(peak_df) {
  if(peak_df['strand']=='+'){
    peak_df['end'] = as.numeric(peak_df['start']) + 100
  }
  else if (peak_df['strand']=='-') {
    peak_df['start'] = as.numeric(peak_df['end']) - 100
  }
}
apply(example, 1, extension)

[1]  14760 661961 662314 700226 703330 713295

它现在有效。

此外，在sapply ，它按列工作，因此您的函数会打印该消息。 看一下sapply(example, function(x){print(x)})

[1] "chr1" "chr1" "chr1" "chr1" "chr1" "chr1"
[1]  14660 661861 662360 700216 703359 713320
[1]  14736 661929 662414 700326 703430 713395
[1]  77  69  55 111  72  76
[1] "+" "+" "-" "-" "-" "-"

Answer 3

您应该编写extension函数的矢量化版本。

extension <- function(peak_df) {
  inds <- peak_df$strand == "+"
  peak_df$end[inds] = peak_df$start[inds] + 100
  peak_df$start[!inds] = peak_df$end[!inds] - 100
  peak_df
}
extension(example)


#  seqnames  start    end width strand
#1     chr1  14660  14760    77      +
#2     chr1 661861 661961    69      +
#3     chr1 662314 662414    55      -
#4     chr1 700226 700326   111      -
#5     chr1 703330 703430    72      -
#6     chr1 713295 713395    76      -

循环遍历 r 中的行数据框并检查 if else 函数语句

问题描述

3 个解决方案

解决方案1
3 已采纳 2021-10-15 05:05:11

解决方案2
1 2021-10-15 05:07:17

解决方案3
1 2021-10-15 05:07:24

循环遍历 r 中的行数据框并检查 if else 函数语句

问题描述

3 个解决方案

解决方案1 3 已采纳 2021-10-15 05:05:11

解决方案2 1 2021-10-15 05:07:17

解决方案3 1 2021-10-15 05:07:24

解决方案1
3 已采纳 2021-10-15 05:05:11

解决方案2
1 2021-10-15 05:07:17

解决方案3
1 2021-10-15 05:07:24