I have a dataframe that looks like this:
> humcon seqnames start end TAS Proxy.start Proxy.end Assembly_NCBI 1 6 28179560 28239932 rs1635 rs78270345 rs4711167 GRCh38.p7 2 6 28239933 28294888 rs1635 rs78270345 rs4711167 GRCh38.p7 3 3 52833805 52847601 rs3617 rs3617 rs2071044 GRCh38.p7 4 15 91426560 91426560 rs4702 rs4702 rs4702 GRCh38.p7 5 19 45382034 45382034 rs6859 rs6859 rs6859 GRCh38.p7
I am trying to combine the first two rows so that the combined row has the start of the first row and end of the second row like this:
> humcon seqnames start end TAS Proxy.start Proxy.end Assembly_NCBI 1 6 28179560 28294888 rs1635 rs78270345 rs4711167 GRCh38.p7 2 3 52833805 52847601 rs3617 rs3617 rs2071044 GRCh38.p7 3 15 91426560 91426560 rs4702 rs4702 rs4702 GRCh38.p7 4 19 45382034 45382034 rs6859 rs6859 rs6859 GRCh38.p7
Does anyone know how I can do this please?
you can do this
library(dplyr)
humcon %>%
# Group by all except start and end
group_by_at(vars(-start, -end)) %>%
# Pick minimum of start and maximum of end
summarise(start = min(start), end = max(end)) %>%
ungroup
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.