How to combine rows of a dataframe in R based on values from two different rows?

Question

I have a dataframe that looks like this:

> humcon seqnames start end TAS Proxy.start Proxy.end Assembly_NCBI 1 6 28179560 28239932 rs1635 rs78270345 rs4711167 GRCh38.p7 2 6 28239933 28294888 rs1635 rs78270345 rs4711167 GRCh38.p7 3 3 52833805 52847601 rs3617 rs3617 rs2071044 GRCh38.p7 4 15 91426560 91426560 rs4702 rs4702 rs4702 GRCh38.p7 5 19 45382034 45382034 rs6859 rs6859 rs6859 GRCh38.p7

I am trying to combine the first two rows so that the combined row has the start of the first row and end of the second row like this:

> humcon seqnames start end TAS Proxy.start Proxy.end Assembly_NCBI 1 6 28179560 28294888 rs1635 rs78270345 rs4711167 GRCh38.p7 2 3 52833805 52847601 rs3617 rs3617 rs2071044 GRCh38.p7 3 15 91426560 91426560 rs4702 rs4702 rs4702 GRCh38.p7 4 19 45382034 45382034 rs6859 rs6859 rs6859 GRCh38.p7

Does anyone know how I can do this please?

Answer 1

you can do this

library(dplyr)
humcon %>% 
  # Group by all except start and end
  group_by_at(vars(-start, -end)) %>%
  # Pick minimum of start and maximum of end
  summarise(start = min(start), end = max(end)) %>%
  ungroup

How to combine rows of a dataframe in R based on values from two different rows?

Question

1 answers

solution1
1 2018-04-11 14:39:14

How to combine rows of a dataframe in R based on values from two different rows?

Question

1 answers

solution1 1 2018-04-11 14:39:14

solution1
1 2018-04-11 14:39:14