简体   繁体   中英

How to combine rows of a dataframe in R based on values from two different rows?

I have a dataframe that looks like this:

> humcon seqnames start end TAS Proxy.start Proxy.end Assembly_NCBI 1 6 28179560 28239932 rs1635 rs78270345 rs4711167 GRCh38.p7 2 6 28239933 28294888 rs1635 rs78270345 rs4711167 GRCh38.p7 3 3 52833805 52847601 rs3617 rs3617 rs2071044 GRCh38.p7 4 15 91426560 91426560 rs4702 rs4702 rs4702 GRCh38.p7 5 19 45382034 45382034 rs6859 rs6859 rs6859 GRCh38.p7

I am trying to combine the first two rows so that the combined row has the start of the first row and end of the second row like this:

> humcon seqnames start end TAS Proxy.start Proxy.end Assembly_NCBI 1 6 28179560 28294888 rs1635 rs78270345 rs4711167 GRCh38.p7 2 3 52833805 52847601 rs3617 rs3617 rs2071044 GRCh38.p7 3 15 91426560 91426560 rs4702 rs4702 rs4702 GRCh38.p7 4 19 45382034 45382034 rs6859 rs6859 rs6859 GRCh38.p7

Does anyone know how I can do this please?

you can do this

library(dplyr)
humcon %>% 
  # Group by all except start and end
  group_by_at(vars(-start, -end)) %>%
  # Pick minimum of start and maximum of end
  summarise(start = min(start), end = max(end)) %>%
  ungroup

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM