简体   繁体   English

拼写检查器

[英]spell checker with a twist

Have a quick question with regards to a spell checker but with a twist. 对于拼写检查器有一个快速的问题,但要有所不同。 effectively, its more vague than you're regular spell checker in the sense that it rather than correcting your words, it judges how correct you are based on how close one gets to the words. 有效地,它比您通常的拼写检查器更模糊,因为它不是纠正您的单词,而是根据人们与单词的接近程度来判断您的正确性。 For instance, if one string is different from another based on two characters or less or less eg "hello and hallo", it will state "nearly there". 例如,如果一个字符串基于两个或两个以下字符(例如“ hello and hallo”)而与另一个字符串不同,则它将声明为“ nearly there”。 Here is the code written below that I attempted. 这是我尝试下面编写的代码。

def spell_checker(correct, guess):
    if guess==correct:
        print("Correct")
    if guess!=correct:
        for g in guess:
        for f in correct:
            if g!=f:
                print("nearly there")
            else:
                print("Wrong")

Obviously I realise this is quite a crude answer since it does not talk about the range of mistakes but to be honest, I could not find a way of incorporating the range of mistakes in word. 显然,我意识到这是一个粗略的答案,因为它没有谈论错误范围,但是老实说,我找不到一种将错误范围合并到单词中的方法。 Even when I looked at the response to nltk's answer, I did not know where to start. 即使我看着对nltk答案的回答,我也不知道从哪里开始。

The output for the answer when applying the "hello, hallo" example was as follows 应用“你好,你好”示例时答案的输出如下

Wrong almost almost almost almost almost almost almost almost almost almost almost Wrong Wrong almost almost almost Wrong Wrong almost almost almost almost almost Wrong 差错差点差点差点差错差点差错差点差错

I believe its almost going through each character and stating whether one character is similar to the other. 我相信它几乎遍历每个字符并说明一个字符是否与另一个字符相似。 Would really appreciate any help on this 非常感谢对此的任何帮助

The problem with your code is that you are comparing every character in the first word with every other character in the other word. 与您的代码的问题是,你在第一个字每一个字符比较与其他字每其它字符。 If you want to compare just characters in the same position, a very very simple way would be to zip the two words and count mismatched characters: 如果只想比较相同位置的字符,则非常简单的方法是将两个单词zip在一起并计算不匹配的字符:

>>> a, b = "hello", "hallo"
>>> sum(x != y for x, y in zip(a, b))
1

But this will of course fail if the words do not have the same length. 但是,如果单词的长度不同,这当然会失败。 Also, it does not work well with missing or superfluous characters: 此外,它对于缺少或多余的字符也不起作用:

>>> a, b = "correct", "corect"
>>> sum(x != y for x, y in zip(a, b))
3

A better approach would be to calculate the edit distance between the two strings. 更好的方法是计算两个字符串之间的编辑距离 If you do not want to implement the algorithm yourself, you could eg use difflib.ndiff : 如果您不想自己实现算法,则可以使用difflib.ndiff

>>> list(difflib.ndiff(a, b))
['  c', '  o', '- r', '  r', '  e', '  c', '  t']
>>> sum(d[0] != " " for d in difflib.ndiff(a, b))
1

Note, however, that this will count replacements twice: Once for the deleted char, and once for the inserted char. 但是请注意,这将对替换计数两次:一次用于删除的字符,一次用于插入的字符。 You could fix this by eg not adding 1 if you get a + followed by a - or vice versa, which is left as an exercise to the interested reader. 您可以通过以下方法解决此问题:例如,如果得到+后跟-则不加1 ,反之亦然,这留给有兴趣的读者练习。

Any way, just count the number of mismatched characters, and print "almost" if that number is small enough. 无论如何,只要计算不匹配字符的数量,如果该数量足够小,就打印"almost"

def spell_checker(correct, guess):
    if guess==correct:
        print("correct")
    elif sum(d[0] != " " for d in difflib.ndiff(correct, guess)) <= 2:
        print("almost")
    else:
        print("wrong")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM