[英]How to find three consecutive rows with the same value
I have a dataframe as follows: 我有一个数据帧如下:
chr leftPos Sample1 X.DD 3_samples MyStuff
1 324 -1 1 1 1
1 4565 -1 0 0 0
1 6887 -1 1 0 0
1 12098 1 -1 1 1
2 12 -1 1 0 1
2 43 -1 1 1 1
5 1 -1 1 1 0
5 43 0 1 -1 0
5 6554 1 1 1 1
5 7654 -1 0 0 0
5 8765 1 1 1 0
5 9833 1 1 1 -1
6 12 1 1 0 0
6 43 0 0 0 0
6 56 1 0 0 0
6 79 1 0 -1 0
6 767 1 0 -1 0
6 3233 1 0 -1 0
I would like to convert it according to the following rules For each chromosome: 我想根据以下规则将其转换为每条染色体:
a. 一个。 If there are three or more 1's or -1's consecutively in a column then the value stays as it is. 如果一列中连续有三个或更多1或-1,则该值保持不变。
b. 湾 If there are less than three 1's or -1s consecutively in a column then the value of the 1 or -1 changes to 0 如果一列中连续少于三个1或-1,则1或-1的值变为0
The rows in a column have to have the same sign (+ or -ve) to be called consecutive. 列中的行必须具有相同的符号(+或-ve)才能称为连续符号。
The result of the dataframe above should be: 上面数据帧的结果应该是:
chr leftPos Sample1 X.DD 3_samples MyStuff
1 324 -1 0 0 0
1 4565 -1 0 0 0
1 6887 -1 0 0 0
1 12098 0 0 0 0
2 12 0 0 0 0
2 43 0 0 0 0
5 1 0 1 0 0
5 43 0 1 0 0
5 6554 0 1 0 0
5 7654 0 0 0 0
5 8765 0 0 0 0
5 9833 0 0 0 0
6 12 0 0 0 0
6 43 0 0 0 0
6 56 1 0 0 0
6 79 1 0 -1 0
6 767 1 0 -1 0
6 3233 1 0 -1 0
I have managed to do this for two consecutive rows but I'm not sure how to change this for three or more rows. 我已经设法连续两行,但我不知道如何更改三行或更多行。
DAT_list2res <-cbind(DAT_list2[1:2],DAT_list2res)
colnames(DAT_list2res)[1:2]<-c("chr","leftPos")
DAT_list2res$chr<-as.numeric(gsub("chr","",DAT_list2res$chr))
DAT_list2res<-as.data.frame(DAT_list2res)
dx<-DAT_list2res
f0 <- function( colNr, dx)
{
col <- dx[,colNr]
n1 <- which(col == 1| col == -1) # The `1`-rows.
d0 <- which( diff(col) == 0) # Consecutive rows in a column are equal.
dc0 <- which( diff(dx[,1]) == 0) # Same chromosome.
m <- intersect( n1-1, intersect( d0, dc0 ) )
return ( setdiff( 1:nrow(dx), union(m,m+1) ) )
}
g <- function( dx )
{
for ( i in 3:ncol(dx) ) { dx[f0(i,dx),i] <- 0 }
return ( dx )
}
dx<-g(dx)
Here is one solution only using base R
. 这是仅使用基础R
一种解决方案。
First define a function that will replace any repetitions which are less than 3 for zeros: 首先定义一个函数,它将替换零的任何小于3的重复:
replace_f <- function(x){
subs <- rle(x)
subs$values[subs$lengths < 3] <- 0
inverse.rle(subs)
}
Then split your data.frame
by chr
and then apply the function to all columns that you want to change (in this case columns 3 to 6): 然后按chr
拆分data.frame
,然后将该函数应用于要更改的所有列(在本例中为第3列到第6列):
df[,3:6] <- do.call("rbind", lapply(split(df[,3:6], df$chr), function(x) apply(x, 2, replace_f)))
Notice that we combine the results together with rbind
before replacing the original data. 请注意,在替换原始数据之前,我们将结果与rbind
组合在一起。 This will give you the desired result: 这将为您提供所需的结果:
chr leftPos Sample1 X.DD X3_samples MyStuff
1 1 324 -1 0 0 0
2 1 4565 -1 0 0 0
3 1 6887 -1 0 0 0
4 1 12098 0 0 0 0
5 2 12 0 0 0 0
6 2 43 0 0 0 0
7 5 1 0 1 0 0
8 5 43 0 1 0 0
9 5 6554 0 1 0 0
10 5 7654 0 0 0 0
11 5 8765 0 0 0 0
12 5 9833 0 0 0 0
13 6 12 0 0 0 0
14 6 43 0 0 0 0
15 6 56 1 0 0 0
16 6 79 1 0 -1 0
17 6 767 1 0 -1 0
18 6 3233 1 0 -1 0
A data.table
solution using rleid
would be 使用rleid
的data.table
解决方案将是
require(data.table)
setDT(dat)
dat[,Sample1 := Sample1 * as.integer(.N>=3), by=.(chr, rleid(Sample1))]
This used the grouping by rleid(Sample1)
and data.table
's helpful .N
-variable. 这使用了rleid(Sample1)
的分组和data.table
的帮助.N
-variable。
Doing it for all columns you could use the eval(parse(text=...))
syntax as follows: 对所有列执行此操作可以使用eval(parse(text=...))
语法,如下所示:
for(i in names(dat)[3:6]){
by_string = paste0("list(chr, rleid(", i, "))")
def_string = paste0(i, "* as.integer(.N>=3)")
dat[,(i) := eval(parse(text=def_string)), by=eval(parse(text=by_string))]
}
So it results in: 因此它导致:
> dat[]
chr leftPos Sample1 X.DD X3_samples MyStuff
1: 1 324 -1 0 0 0
2: 1 4565 -1 0 0 0
3: 1 6887 -1 0 0 0
4: 1 12098 0 0 0 0
5: 2 12 0 0 0 0
6: 2 43 0 0 0 0
7: 5 1 0 1 0 0
8: 5 43 0 1 0 0
9: 5 6554 0 1 0 0
10: 5 7654 0 0 0 0
11: 5 8765 0 0 0 0
12: 5 9833 0 0 0 0
13: 6 12 0 0 0 0
14: 6 43 0 0 0 0
15: 6 56 1 0 0 0
16: 6 79 1 0 -1 0
17: 6 767 1 0 -1 0
18: 6 3233 1 0 -1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.