简体   繁体   English

提取所有出现的两个字符串之间不同的字符

[英]Extract all occurrences of characters that differ between two strings

I have used adist to calculate the number of characters that differ between two strings: 我已经使用adist来计算两个字符串之间不同的字符数:

a <- "#IvoryCoast TENNIS US OPEN Clément «Un beau combat» entre Simon et Cilic"
b <- "Clément «Un beau combat» entre Simon et Cilic"
adist(a,b) # result 27

Now I would like to extract all the occurrences of those characters that differ. 现在,我想提取所有这些不同字符的出现 In my example, I would like to get the string "#IvoryCoast TENNIS US OPEN " . 在我的示例中,我想获取字符串"#IvoryCoast TENNIS US OPEN "

I tried and used: 我尝试并使用:

paste(Reduce(setdiff, strsplit(c(a, b), split = "")), collapse = "")

But the obtained result is not what I expected! 但是获得的结果不是我期望的!

#IvysTENOP

For this case, you could use gsub. 对于这种情况,可以使用gsub。

> a <- "#IvoryCoast TENNIS US OPEN Clément «Un beau combat» entre Simon et Cilic"
> b <- "Clément «Un beau combat» entre Simon et Cilic"
> gsub(b, "", a)
[1] "#IvoryCoast TENNIS US OPEN "

You can do, based on the paste/reduce solution: 您可以根据paste/reduce解决方案进行操作:

paste(Reduce(setdiff, strsplit(c(a, b), split = " ")), collapse = " ")
#[1] "#IvoryCoast TENNIS US OPEN"

Or, if you want to get separated items, with setdiff and strsplit : 或者,如果您想使用setdiffstrsplit来分隔项目:

setdiff(strsplit(a," ")[[1]],strsplit(b," ")[[1]])
#[1] "#IvoryCoast" "TENNIS"      "US"          "OPEN" 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM