[英]How can I extract rows from above and below a specific row in an R dataframe?
目前,我正在处理一些Fastq测序数据。 我有一个包含三列和几百行的数据框。 第一列包含原始测序读数,其他列包含有关这些读数的信息。 我想返回第三列中包含字符串“ FALSE”的行,再加上直接在其上方的行,以及直接在其下方的两行。 我认为它类似于shell中的grep -A -B。
我环顾四周,我的问题与此相似:
但是,这里的答案基于行名,而不是行中的字符串。 我的行名只是数字顺序的数字。
Fastq Output BARCODE Dulplicated
1 ReadName1 NA NA
2 ReadSeq1 TGTG TTAT FALSE
3 + NA NA
4 Ascii_score1 NA NA
5 ReadName2 NA NA
6 ReadSeq2 TGCT TTAT FALSE
7 + NA NA
8 Ascii_score2 NA NA
9 ReadName3 NA NA
10 ReadSeq3 TGCT TTAT TRUE
11 + NA NA
12 Ascii_score3 NA NA
如果duplicated
列具有字符值。 你可以做
inds <- which(df$Dulplicated == "FALSE")
df[sort(unique(c(inds, inds - 1, inds + 1, inds + 2))), ]
# FastqOutput BARCODE Dulplicated
#1 ReadName1 <NA> NA
#2 ReadSeq1 TGTGTTAT FALSE
#3 + <NA> NA
#4 Ascii_score1 <NA> NA
#5 ReadName2 <NA> NA
#6 ReadSeq2 TGCTTTAT FALSE
#7 + <NA> NA
#8 Ascii_score2 <NA> NA
或类似地使用dplyr::slice
library(dplyr)
df %>% slice(sort(unique(c(inds, inds - 1, inds + 1, inds + 2))))
数据
df <- structure(list(FastqOutput = structure(c(5L, 8L, 1L, 2L, 6L,
9L, 1L, 3L, 7L, 10L, 1L, 4L), .Label = c("+", "Ascii_score1",
"Ascii_score2", "Ascii_score3", "ReadName1", "ReadName2", "ReadName3",
"ReadSeq1", "ReadSeq2", "ReadSeq3"), class = "factor"), BARCODE =
structure(c(NA, 2L, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA), .Label = c("TGCTTTAT",
"TGTGTTAT"), class = "factor"), Dulplicated = c(NA, FALSE, NA,
NA, NA, FALSE, NA, NA, NA, TRUE, NA, NA)), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
我们可以使用data.table
library(data.table)
setDT(df)[df[, {i1 <-.I[which(!as.logical(Dulplicated))]
sort(unique(i1+ rep((-2:2), length(i1)))) }]]
# FastqOutput BARCODE Dulplicated
#1: ReadName1 <NA> NA
#2: ReadSeq1 TGTGTTAT FALSE
#3: + <NA> NA
#4: Ascii_score1 <NA> NA
#5: ReadName2 <NA> NA
#6: ReadSeq2 TGCTTTAT FALSE
#7: + <NA> NA
#8: Ascii_score2 <NA> NA
或者可以更紧凑地写
setDT(df)[df[, Reduce(`|`, shift(!as.logical(Dulplicated), n = -2:2))]]
df <- structure(list(FastqOutput = structure(c(5L, 8L, 1L, 2L, 6L,
9L, 1L, 3L, 7L, 10L, 1L, 4L), .Label = c("+", "Ascii_score1",
"Ascii_score2", "Ascii_score3", "ReadName1", "ReadName2", "ReadName3",
"ReadSeq1", "ReadSeq2", "ReadSeq3"), class = "factor"), BARCODE =
structure(c(NA, 2L, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA), .Label = c("TGCTTTAT",
"TGTGTTAT"), class = "factor"), Dulplicated = c(NA, FALSE, NA,
NA, NA, FALSE, NA, NA, NA, TRUE, NA, NA)), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.