簡體   English   中英

比較兩列中的值並在 R 或 awk 中重新編碼

[英]compare values in two columns and recode in R or awk

我有以下格式的文件,下面顯示了幾行。

<N2>    AS  12/13:2:-1000.00,-25.73     13/13:2:-272.09,-12.81
<N2>    AS  6/6:2:-1000.00,-19.88   8/8:2:-211.51,-5.98
<N0>    AS  4/4:0:2:-218.21,-11.95  4/4:2:-208.55,-11.01
<N0>    AS  0/0:2:-1000.00,-16.68   0/0:2:-294.18,-10.45
<N0>    AS  0/1:2:-1000.00,-16.68   0/1:2:-294.18,-10.45
<N0>    AS  1/1:2:-1000.00,-16.68   1/1:2:-294.18,-10.45

$3 中的第一個元素需要與 $4 中以“:”分隔的第一個元素進行比較,並僅使用 0 和 1 值重新編碼。 示例數據的所有四種可能比較情況的邏輯如下所示:

when only one value differ between the two elements then change to 0/0  and 0/1    
when both values differ between the two elements then change to 0/0  and 1/1  
when both values are same and non-zero  between the two elements  then change to 1/1  and  1/1
when both the values are arleady coded in 0 and 1 do not change them.

在按照上述邏輯的示例數據中,將 $3 中的第一個元素與 $4 進行比較。

12/13  and  13/13 have one value in common separated by "/" so change then to 0/0 and 1/1
6/6 and 8/8 both values separated by "/" differ between $3 and $4, so change to 0/0 and 1/1
4/4 and 4/4 both values separated by "/" are same between $3 and $4 and non-zero values so change to   1/1 and 1/1

如果值已編碼為 0 和 1,則不要更改。

因此,上述示例的輸出如下所示:

<N2>    AS  0/0:2:-1000.00,-25.73   0/1:2:-272.09,-12.81
<N2>    AS  0/0:2:-1000.00,-19.88   1/1/0:2:-211.51,-5.98
<N0>    AS  1/1:0:2:-218.21,-11.95  1/1:2:-208.55,-11.01
<N0>    AS  0/0:2:-1000.00,-16.68   0/0:2:-294.18,-10.45
<N0>    AS  0/1:2:-1000.00,-16.68   0/1:2:-294.18,-10.45
<N0>    AS  1/1:2:-1000.00,-16.68   1/1:2:-294.18,-10.45

awk 或 R 中的任何可能的解決方案?

您可以在 R 中執行以下操作。

數據:

df1<-
data.table::fread("<N2>    AS  12/13:2:-1000.00,-25.73     13/13:2:-272.09,-12.81
<N2>    AS  6/6:2:-1000.00,-19.88   8/8/0:2:-211.51,-5.98
                  <N0>    AS  4/4:0:2:-218.21,-11.95  4/4:2:-208.55,-11.01
                  <N0>    AS  0/0:2:-1000.00,-16.68   0/0:2:-294.18,-10.45
                  <N0>    AS  0/1:2:-1000.00,-16.68   0/1:2:-294.18,-10.45
                  <N0>    AS  1/1:2:-1000.00,-16.68   1/1:2:-294.18,-10.45",sep=" ",header=F) %>% setDF

代碼:創建一個為您完成工作並加載庫的函數:

library(magrittr)
library(dplyr)
fun1 <- function(df_in) {
    vals <- lapply(df_in,function(x){sub("(\\d+/\\d+).*","\\1",x,perl=T) %>% strsplit("/") %>% lapply(as.numeric)})
    newvals<-
        mapply(function(x,y){
            if(all(c(x,y) %in% 0:1)) list(paste0(x,collapse="/"),paste0(y,collapse="/")) else {
                u = -abs(x-y)<=-1;
                return(
                    case_when(
                        identical(u,c(T,F)) ~ list("0/0","0/1"),
                        identical(u,c(F,T)) ~ list("0/0","0/1"),
                        identical(u,c(T,T)) ~ list("0/0","1/1"),
                        identical(u,c(F,F)) ~ list("1/1","1/1"),
                        TRUE    ~ list("Error","Error")
                    )
                )
            } },x=vals[[1]],y=vals[[2]])
    return(
        list(
            paste0(unlist(newvals[1,]),sub("\\d+/\\d+","",df_in[[1]])),
            paste0(unlist(newvals[2,]),sub("\\d+/\\d+","",df_in[[2]]))
        )
    )
}

調用函數:在需要更改的列號上:

df1[,3:4] %<>% fun1

結果:

#> df1
#    V1 V2                     V3                    V4
#1 <N2> AS  0/0:2:-1000.00,-25.73  0/1:2:-272.09,-12.81
#2 <N2> AS  0/0:2:-1000.00,-19.88 1/1/0:2:-211.51,-5.98
#3 <N0> AS 1/1:0:2:-218.21,-11.95  1/1:2:-208.55,-11.01
#4 <N0> AS  0/0:2:-1000.00,-16.68  0/0:2:-294.18,-10.45
#5 <N0> AS  0/1:2:-1000.00,-16.68  0/1:2:-294.18,-10.45
#6 <N0> AS  1/1:2:-1000.00,-16.68  1/1:2:-294.18,-10.45

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM