简体   繁体   中英

Extract distinct characters that differ between two strings

I have two strings, a <- "AERRRTX"; b <- "TRRA" a <- "AERRRTX"; b <- "TRRA" .

I want to extract the characters in a not used in b , ie "ERX"

I tried the answer in Extract characters that differ between two strings , which uses setdiff . It returns "EX", because b does have "R" and setdiff will eliminate all three "R"s in a . My aim is to treat each character as distinct, so only two of the three R's in a should be eliminated.

Any suggestions on what I can use instead of setdiff , or some other approach to achieve my output?

A different approach using pmatch ,

a1 <- unlist(strsplit(a, ""))
b1 <- unlist(strsplit(b, "")) 

a1[!1:length(a1) %in% pmatch(b1, a1)]

 #[1] "E" "R" "X"

Another example,

a <- "Ronak";b<-"Shah"

a1 <- unlist(strsplit(a, ""))
b1 <- unlist(strsplit(b, ""))
a1[!1:length(a1) %in% pmatch(b1, a1)]

# [1] "R" "o" "n" "k"

We can use Reduce() to successively eliminate from a each character found in b :

a <- 'AERRRTX'; b <- 'TRRA';
paste(collapse='',Reduce(function(as,bc) as[-match(bc,as,nomatch=length(as)+1L)],strsplit(b,'')[[1L]],strsplit(a,'')[[1L]]));
## [1] "ERX"

This will preserve the order of the surviving characters in a .


Another approach is to mark each character with its occurrence index in a , do the same for b , and then we can use setdiff() :

a <- 'AERRRTX'; b <- 'TRRA';
pasteOccurrence <- function(x) ave(x,x,FUN=function(x) paste0(x,seq_along(x)));
paste(collapse='',substr(setdiff(pasteOccurrence(strsplit(a,'')[[1L]]),pasteOccurrence(strsplit(b,'')[[1L]])),1L,1L));
## [1] "ERX"

You can use the function vsetdiff from vecsets package

install.packages("vecsets")
library(vecsets)
a <- "AERRRTX"
b <- "TRRA"  
Reduce(vsetdiff, strsplit(c(a, b), split = ""))
## [1] "E" "R" "X"

An alternative using data.table package`:

library(data.table)

x = data.table(table(strsplit(a, '')[[1]]))
y = data.table(table(strsplit(b, '')[[1]]))

dt = y[x, on='V1'][,N:=ifelse(is.na(N),0,N)][N!=i.N,res:=i.N-N][res>0]

rep(dt$V1, dt$res)
#[1] "E" "R" "X"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM