简体   繁体   English

使用C#,我该如何替换相似的单词?

[英]Using C#, how can I replace similar words?

Assuming these two strings: 假设这两个字符串:

string s1="control";
string s2="conrol"; (or "ocntrol", "onrtol", "lcontro" etc.)

How can I programatically find that s2 is similar with s1 and replace the s2 string with the s1 string? 我怎样才能以编程方式找到s2与s1相似并用s1字符串替换s2字符串?

Thanks. 谢谢。

Jeff 杰夫

You could try to check the Levenshtein distance between your two words and if the distance is beyond a threshold, replace the word. 您可以尝试检查两个单词之间的Levenshtein距离 ,如果距离超出阈值,则替换该单词。

The hard part is defining the threshold, in your examples a threshold of 2 could work. 困难的部分是定义阈值,在您的示例中,阈值2可以起作用。

( Implementation of Levenshtein distance in C# ) 在C#中实施Levenshtein距离

You can use Levenshtein Distance which would give you a rank on how close the two words are. 您可以使用Levenshtein Distance ,它可以给出两个单词有多接近的等级。 You need to decide at which rank you do the replace . 你需要决定你做哪个等级替换。

I'll Suggest a simpler answer. 我会建议一个更简单的答案。 Compare the length of the 2 strings & also compare the sum of ASCII values of the both strings. 比较2个字符串的长度,并比较两个字符串的ASCII值之和。

I'd use matlab to run some tests on this. 我会用matlab对此进行一些测试。 I would do the follow 我会做以下

CONTROL 1111111 CONTROL 1111111

OCNTROL 0011111 OCNTROL 0011111

ONRCTOL 0000111 ONRCTOL 0000111

So I have all 1s for original word, than I have five 1s in a second case and three 1s in a third. 所以我的原始单词都是1,而我在第二种情况下有5个1,在第三种情况下有3个1。 You can say that 70% is acceptable and if 70% match than I will use this word. 你可以说70%是可以接受的,如果70%匹配,我将使用这个词。 OCNTROL will get accepted, but ONRCTOL won't. OCNTROL将被接受,但ONRCTOL不会。

I say Matlab because you can easily load a lot of data into vectors and do vector comparissons. 我说Matlab是因为你可以轻松地将大量数据加载到矢量中并进行矢量比较。

Linq Method :尝试将字符存储在两个List<chars>List<String> ,并将samller与较大的List<String>进行比较( SequenceEqualExcept )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM