简体   繁体   English

通过忽略某些字符来比较两个字符串

[英]Compare two strings by ignoring certain characters

I wonder if there is an easy way to check if two strings match by excluding certain characters in the strings. 我想知道是否有一种简单的方法可以通过排除字符串中的某些字符来检查两个字符串是否匹配。 See example below. 见下面的例子。

I can easily write such a method by writing a regular expression to find the "wild card" characters, and replace them with a common character. 我可以通过编写正则表达式来查找“通配符”字符,并用常用字符替换它们,从而轻松编写这样的方法。 Then compare the two strings str1 and str2. 然后比较两个字符串str1和str2。 I am not looking for such implementations, but like to know whether there are any .Net framework classes that can take care of this. 我不是在寻找这样的实现,而是想知道是否有任何.Net框架类可以解决这个问题。 Seems like a common need, but I couldn't find any such method. 似乎是一种常见的需求,但我找不到任何这样的方法。

For example: 例如:

string str1 = "ABC-EFG";    
string str2 = "ABC*EFG";

The two strings must be declared equal. 必须声明两个字符串相等。

Thanks! 谢谢!

我发现自己有相同的要求,我使用的解决方案基于String.Compare方法:

String.Compare(str1, str2, CultureInfo.InvariantCulture, CompareOptions.IgnoreSymbols)

Not sure if this helps: 不确定这是否有帮助:

The Damerau-Levenshtein distance is one of several algorithms dealing with fuzzy string searching . Damerau-Levenshtein距离是处理模糊字符串搜索的几种算法之一。

The DLD between "ABC-EFG" and "ABC*EFG" is 1—"the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two characters." “ABC-EFG”和“ABC * EFG”之间的DLD是1-“将一个字符串转换为另一个字符串所需的最小操作数,其中操作被定义为单个字符的插入,删除或替换,或两个字符的换位。“

Of course this algorithm would also return 1 for the two strings "ZBC-EFG" and "ABC-EFG"—possibly not what you are looking for. 当然,这个算法也会为两个字符串“ZBC-EFG”和“ABC-EFG”返回1 - 可能不是你想要的。

An implementation of the DLD, in Python, from http://paxe.googlecode.com/svn/trunk/paxe/Lib/Installer.py : Python中的DLD实现,来自http://paxe.googlecode.com/svn/trunk/paxe/Lib/Installer.py

def dist(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1

    for i in xrange(0,lenstr1):
        for j in xrange(0,lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                d[(i-1,j)] + 1, # deletion
                d[(i,j-1)] + 1, # insertion
                d[(i-1,j-1)] + cost, # substitution
                )
            if i>1 and j>1 and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

    return d[lenstr1-1,lenstr2-1]

Sorry but I think either regex, or replacing the "wildcard" characters with a common character are going to be your best solution. 对不起,但我认为正则表达式,或用通用字符替换“通配符”字符将是您最好的解决方案。 Basically, the answers that you stated you didn't want to receive. 基本上,你说的答案不希望收到。

You can of course test the regex w/out substitution: 你当然可以测试正则表达式w / out替换:

[a-zA-z]{3}.[a-zA-z]{3}

Seems like a common use for regex, so why the avoidance? 似乎是正则表达式的常见用法,为什么要避免?

不,框架本身没有任何东西可以做到这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM