Conditional statement for groups of rows of an R dataframe

Question

Imagine I have this dataframe:

position    coverage
   1           30
   2            2
   3            1
   4            8
   5            2
   6            3
   7            20
   8            40
   .             .
   .             .
  100            0
  101            2
  102            3
  103            4
  104            1
  105           40

I would like to get those regions where, along at least 4 positions, the coverage is below a value of 10.

In this case, my desired output is:

start      end
  2         6
 100       104

I was trying a for loop, but I don't know how to build it to work with a group of rows instead of row by row. Do you know how can I achieve this output?

Answer 1

We can use rleid from data.table . Created a grouping index based on the 'coverage' values less than 10, subset the 'position' where the number of rows are greater than or equal to 4 and all of 'coverage' are less than 10, then use the 'grp' to get the first and last element of the 'position'

library(data.table)
setDT(df1)[, position[.N >=4 & all(coverage < 10)],
         .(grp = rleid(coverage < 10))][,
      .(start = first(V1), end = last(V1)), grp][, grp := NULL][]
#    start end
#1:     2   6
#2:   100 104

Or with dplyr

library(dplyr)
df1 %>% 
   group_by(grp = rleid(coverage < 10)) %>% 
   filter(all(coverage < 10), n() >=4) %>% 
   group_by(grp) %>% 
   summarise(start = first(position), end = last(position)) %>% 
   select(-grp)

Or with rle from base R

rl <- rle(df1$coverage < 10)
do.call(rbind, lapply(split(df1$position,
   rep(seq_along(rl$values), rl$lengths)), range)[rl$values & rl$lengths >= 4])

data

df1 <- structure(list(position = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 100L, 
101L, 102L, 103L, 104L, 105L), coverage = c(30L, 2L, 1L, 8L, 
2L, 3L, 20L, 40L, 0L, 2L, 3L, 4L, 1L, 40L)), class = "data.frame", 
row.names = c(NA, 
-14L))

Conditional statement for groups of rows of an R dataframe

Question

1 answers

solution1
2 ACCPTED 2020-05-25 18:44:15

data

Conditional statement for groups of rows of an R dataframe

Question

1 answers

solution1 2 ACCPTED 2020-05-25 18:44:15

data

solution1
2 ACCPTED 2020-05-25 18:44:15