[英]Extract all occurrences of characters that differ between two strings
I have used adist
to calculate the number of characters that differ between two strings: 我已经使用
adist
来计算两个字符串之间不同的字符数:
a <- "#IvoryCoast TENNIS US OPEN Clément «Un beau combat» entre Simon et Cilic"
b <- "Clément «Un beau combat» entre Simon et Cilic"
adist(a,b) # result 27
Now I would like to extract all the occurrences of those characters that differ. 现在,我想提取所有这些不同字符的出现 。 In my example, I would like to get the string
"#IvoryCoast TENNIS US OPEN "
. 在我的示例中,我想获取字符串
"#IvoryCoast TENNIS US OPEN "
。
I tried and used: 我尝试并使用:
paste(Reduce(setdiff, strsplit(c(a, b), split = "")), collapse = "")
But the obtained result is not what I expected! 但是获得的结果不是我期望的!
#IvysTENOP
For this case, you could use gsub. 对于这种情况,可以使用gsub。
> a <- "#IvoryCoast TENNIS US OPEN Clément «Un beau combat» entre Simon et Cilic"
> b <- "Clément «Un beau combat» entre Simon et Cilic"
> gsub(b, "", a)
[1] "#IvoryCoast TENNIS US OPEN "
You can do, based on the paste/reduce
solution: 您可以根据
paste/reduce
解决方案进行操作:
paste(Reduce(setdiff, strsplit(c(a, b), split = " ")), collapse = " ")
#[1] "#IvoryCoast TENNIS US OPEN"
Or, if you want to get separated items, with setdiff
and strsplit
: 或者,如果您想使用
setdiff
和strsplit
来分隔项目:
setdiff(strsplit(a," ")[[1]],strsplit(b," ")[[1]])
#[1] "#IvoryCoast" "TENNIS" "US" "OPEN"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.