[英]How to remove matching characters between two columns
我有一個數據框 df。 我想gsub
REF
和Effect_allele
的值,直到一個或另一個被完全刪除或留下不同的字符。
df <- structure(list(CHROM = c("chr1", "chr1", "chr1", "chr1", "chr1"
), POS_GRCh38 = c(109655507L, 145830809L, 168201814L, 172359627L,
204533386L), REF = c("CAAA", "CT", "C", "TA", "TTCTGAAACAGGG"
), Effect_allele = c("C", "C", "CA", "T", "TC"), Effect_size = c(0.0266,
0.0126, 0.0718, 0.0655, 0.1345)), row.names = c(234L, 240L, 243L,
244L, 249L), class = "data.frame")
我想要的結果是:
CHROM POS_GRCh38 REF Effect_allele Effect_size
chr1 109655507 AAA 0.0266
chr1 145830809 T 0.0126
chr1 168201814 A 0.0718
chr1 172359627 A T 0.0655
chr1 204533386 TCTGAAACAGGG C 0.1345
我可以創建一個如下所示的索引並執行 gsub,但我想知道是否有更簡單的解決方案。
max.values <- apply(cbind(nchar(dfS$REF), nchar(df$Effect_allele)),1, which.max)
min.values <- apply(cbind(nchar(df$REF), nchar(df$Effect_allele)),1, which.min)
你可以寫一個小的遞歸函數來完成這個任務:
library(stringr) # for str_remove function
fun <- function(a, b){
a1 <- substr(a,1,1)
b1 <- substr(b, 1, 1)
d <- asplit(cbind(a, b), 1)
ifelse(a1==b1, Recall(str_remove(a,a1), str_remove(b, b1)), d)
}
df[c('REF', 'Effect_allele')] <- do.call(rbind, fun(df$REF, df$Effect_allele))
df
CHROM POS_GRCh38 REF Effect_allele Effect_size
234 chr1 109655507 AAA 0.0266
240 chr1 145830809 T 0.0126
243 chr1 168201814 A 0.0718
244 chr1 172359627 A 0.0655
249 chr1 204533386 TCTGAAACAGGG C 0.1345
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.