简体   繁体   English

R:获取区间内的数字序列,不起作用

[英]R: Getting the sequence of numbers in an interval, not working

I have the following data frame, it contains coordinates and there respective values, these can be at intervals of length 1,2,4,6,8...我有以下数据框,它包含坐标和相应的值,这些值的间隔可以是 1,2,4,6,8 ...

chr  start end   meth   cov  
chr1 16136 16136 100.00  1.0 
chr1 16137 16138 100.00  4.0
...
chr2 16139 16142 100.00  4.5
chr2 16243 16246 100.00 10.0
chr2 16247 16250  83.33  6.0
...
chr3 16251 16256  50.0   2.0

What I want to do is to split each interval (,=1,2) in a equal length of two and keep their respective information: for example:我想要做的是将每个间隔(,= 1,2)分成两个相等的长度并保留它们各自的信息:例如:

chr1 16136 16136 100.00  1.0    
chr1 16137 16138 100.00  4.0
...
chr1 16139 16140 100.00  4.5
chr1 16141 16142 100.00  4.5
chr1 16243 16244 100.00 10.0
chr1 16245 16246 100.00 10.0
chr1 16247 16248  83.33  6.0
chr1 16249 16250  83.33  6.0
...
chr2 16251 16252  50.0   2.0    
chr2 16253 16254  50.0   2.0
chr2 16255 16256  50.0   2.0

I've received help and the following code is helping, but I'm getting this error when applying seq我收到了帮助,以下代码有帮助,但是在应用 seq 时出现此错误

Error in seq.default(start, end + 1, 2): 'from' must be of length 1. seq.default(start, end + 1, 2) 中的错误:'from' 的长度必须为 1。

Does anyone knows why and how to fix it or another option?有谁知道为什么以及如何解决它或其他选择?

 df %>% filter(end-start >2 ) %>%rowwise() %>% mutate(start2=list(seq(start,end+1,2)))

Here comes a base R solution.这是一个基本的 R 解决方案。 First we make it easier for ourselves by defining a function seqr() that creates sequences out of a range of length 2.首先,我们通过定义一个 function seqr()来让自己变得更容易,该 seqr() 创建长度范围为 2 的序列。

seqr <- function(x) seq(x[[1]], x[[2]])

Then – assuming unique columns as in your example – we create row-wise 1:nrow(dat) sequences of start and stop and fill the results row-wise into a two-columned matrix , and cbind it together with the remaining columns exploiting recycling.然后 – 假设您的示例中的列是唯一的 – 我们创建startstop的按行1:nrow(dat)序列,并将结果按行填充到一个两列matrix中,并将其与剩余的列cbind一起利用回收. Result will be rbind ed.结果将被rbind编辑。

res <- do.call(rbind, 
        lapply(1:nrow(dat), function(i)
          cbind(chr=dat[i, 1],
                matrix(seqr(dat[i, 2:3]), ncol=2, byrow=TRUE, 
                       dimnames=list(NULL, names(dat)[2:3])), 
                dat[i, 4:5], row.names=NULL)))
res
#     chr start   end   meth  cov
# 1  chr1 16136 16136 100.00  1.0
# 2  chr1 16137 16138 100.00  4.0
# 3  chr2 16139 16140 100.00  4.5
# 4  chr2 16141 16142 100.00  4.5
# 5  chr2 16243 16244 100.00 10.0
# 6  chr2 16245 16246 100.00 10.0
# 7  chr2 16247 16248  83.33  6.0
# 8  chr2 16249 16250  83.33  6.0
# 9  chr3 16251 16252  50.00  2.0
# 10 chr3 16253 16254  50.00  2.0
# 11 chr3 16255 16256  50.00  2.0

Data数据

dat <- structure(list(chr = c("chr1", "chr1", "chr2", "chr2", "chr2", 
"chr3"), start = c(16136L, 16137L, 16139L, 16243L, 16247L, 16251L
), end = c(16136L, 16138L, 16142L, 16246L, 16250L, 16256L), meth = c(100, 
100, 100, 100, 83.33, 50), cov = c(1, 4, 4.5, 10, 6, 2)), row.names = c(NA, 
-6L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM