简体   繁体   English

Python 的拼写检查器

[英]Spell Checker for Python

I'm fairly new to Python and NLTK.我对 Python 和 NLTK 还很陌生。 I am busy with an application that can perform spell checks (replaces an incorrectly spelled word with the correct one).我正忙于一个可以执行拼写检查的应用程序(用正确的单词替换拼写错误的单词)。 I'm currently using the Enchant library on Python 2.7, PyEnchant and the NLTK library.我目前在 Python 2.7、PyEnchant 和 NLTK 库上使用 Enchant 库。 The code below is a class that handles the correction/replacement.下面的代码是处理校正/替换的 class。

from nltk.metrics import edit_distance

class SpellingReplacer:
    def __init__(self, dict_name='en_GB', max_dist=2):
        self.spell_dict = enchant.Dict(dict_name)
        self.max_dist = 2

    def replace(self, word):
        if self.spell_dict.check(word):
            return word
        suggestions = self.spell_dict.suggest(word)

        if suggestions and edit_distance(word, suggestions[0]) <= self.max_dist:
            return suggestions[0]
        else:
            return word

I have written a function that takes in a list of words and executes replace() on each word and then returns a list of those words, but spelled correctly.我写了一个 function ,它接受一个单词列表并对每个单词执行 replace() ,然后返回这些单词的列表,但拼写正确。

def spell_check(word_list):
    checked_list = []
    for item in word_list:
        replacer = SpellingReplacer()
        r = replacer.replace(item)
        checked_list.append(r)
    return checked_list

>>> word_list = ['car', 'colour']
>>> spell_check(words)
['car', 'color']

Now, I don't really like this because it isn't very accurate and I'm looking for a way to achieve spelling checks and replacements on words.现在,我不太喜欢这个,因为它不是很准确,我正在寻找一种方法来实现单词的拼写检查和替换。 I also need something that can pick up spelling mistakes like "caaaar"?我还需要一些可以识别诸如“caaaar”之类的拼写错误的东西? Are there better ways to perform spelling checks out there?有没有更好的方法来执行拼写检查? If so, what are they?如果是这样,它们是什么? How does Google do it?谷歌是如何做到的? Because their spelling suggester is very good.因为他们的拼写建议非常好。

Any suggestions?有什么建议么?

You can use the autocorrect lib to spell check in python.您可以使用自动更正库在 python 中进行拼写检查。
Example Usage:示例用法:

from autocorrect import Speller

spell = Speller(lang='en')

print(spell('caaaar'))
print(spell('mussage'))
print(spell('survice'))
print(spell('hte'))

Result:结果:

caesar
message
service
the

I'd recommend starting by carefully reading this post by Peter Norvig .我建议从仔细阅读Peter Norvig 的这篇文章开始。 (I had to something similar and I found it extremely useful.) (我不得不做类似的事情,我发现它非常有用。)

The following function, in particular has the ideas that you now need to make your spell checker more sophisticated: splitting, deleting, transposing, and inserting the irregular words to 'correct' them.下面的函数尤其具有使您的拼写检查器更加复杂的想法:拆分、删除、转置和插入不规则单词以“更正”它们。

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

Note: The above is one snippet from Norvig's spelling corrector注意:以上是 Norvig 拼写校正器的一个片段

And the good news is that you can incrementally add to and keep improving your spell-checker.好消息是,您可以逐步添加并不断改进拼写检查器。

Hope that helps.希望有帮助。

The best way for spell checking in python is by: SymSpell, Bk-Tree or Peter Novig's method.在 python 中进行拼写检查的最佳方法是:SymSpell、Bk-Tree 或 Peter Novig 的方法。

The fastest one is SymSpell.最快的是SymSpell。

This is Method1 : Reference link pyspellchecker这是方法1 :参考链接pyspellchecker

This library is based on Peter Norvig's implementation.这个库基于 Peter Norvig 的实现。

pip install pyspellchecker pip 安装 pyspellchecker

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

Method2: SymSpell Python方法二: SymSpell Python

pip install -U symspellpy pip install -U symspellpy

Maybe it is too late, but I am answering for future searches.也许为时已晚,但我正在回答未来的搜索。 TO perform spelling mistake correction, you first need to make sure the word is not absurd or from slang like, caaaar, amazzzing etc. with repeated alphabets.要进行拼写错误更正,您首先需要确保单词不是荒谬的,或者来自俚语、caaaar、amazzzing 等重复字母。 So, we first need to get rid of these alphabets.所以,我们首先需要摆脱这些字母。 As we know in English language words usually have a maximum of 2 repeated alphabets, eg, hello., so we remove the extra repetitions from the words first and then check them for spelling.正如我们所知,英语单词通常最多有 2 个重复的字母,例如,hello.,因此我们首先从单词中删除多余的重复,然后检查它们的拼写。 For removing the extra alphabets, you can use Regular Expression module in Python.要删除多余的字母,您可以使用 Python 中的正则表达式模块。

Once this is done use Pyspellchecker library from Python for correcting spellings.完成后,使用 Python 中的 Pyspellchecker 库来纠正拼写。

For implementation visit this link: https://rustyonrampage.github.io/text-mining/2017/11/28/spelling-correction-with-python-and-nltk.html如需实施,请访问此链接: https : //rustyonrampage.github.io/text-mining/2017/11/28/spelling-correction-with-python-and-nltk.html

Try jamspell - it works pretty well for automatic spelling correction:尝试jamspell - 它适用于自动拼写校正:

import jamspell

corrector = jamspell.TSpellCorrector()
corrector.LoadLangModel('en.bin')

corrector.FixFragment('Some sentnec with error')
# u'Some sentence with error'

corrector.GetCandidates(['Some', 'sentnec', 'with', 'error'], 1)
# ('sentence', 'senate', 'scented', 'sentinel')

IN TERMINAL在终端

pip install gingerit

FOR CODE代码

from gingerit.gingerit import GingerIt
text = input("Enter text to be corrected")
result = GingerIt().parse(text)
corrections = result['corrections']
correctText = result['result']

print("Correct Text:",correctText)
print()
print("CORRECTIONS")
for d in corrections:
  print("________________")  
  print("Previous:",d['text'])  
  print("Correction:",d['correct'])   
  print("`Definiton`:",d['definition'])
 

spell corrector->拼写校正器->

you need to import a corpus on to your desktop if you store elsewhere change the path in the code i have added a few graphics as well using tkinter and this is only to tackle non word errors!!如果您存储在其他地方,则需要将语料库导入桌面,请更改代码中的路径我还使用 tkinter 添加了一些图形,这只是为了解决非单词错误!!

def min_edit_dist(word1,word2):
    len_1=len(word1)
    len_2=len(word2)
    x = [[0]*(len_2+1) for _ in range(len_1+1)]#the matrix whose last element ->edit distance
    for i in range(0,len_1+1):  
        #initialization of base case values
        x[i][0]=i
        for j in range(0,len_2+1):
            x[0][j]=j
    for i in range (1,len_1+1):
        for j in range(1,len_2+1):
            if word1[i-1]==word2[j-1]:
                x[i][j] = x[i-1][j-1]
            else :
                x[i][j]= min(x[i][j-1],x[i-1][j],x[i-1][j-1])+1
    return x[i][j]
from Tkinter import *


def retrieve_text():
    global word1
    word1=(app_entry.get())
    path="C:\Documents and Settings\Owner\Desktop\Dictionary.txt"
    ffile=open(path,'r')
    lines=ffile.readlines()
    distance_list=[]
    print "Suggestions coming right up count till 10"
    for i in range(0,58109):
        dist=min_edit_dist(word1,lines[i])
        distance_list.append(dist)
    for j in range(0,58109):
        if distance_list[j]<=2:
            print lines[j]
            print" "   
    ffile.close()
if __name__ == "__main__":
    app_win = Tk()
    app_win.title("spell")
    app_label = Label(app_win, text="Enter the incorrect word")
    app_label.pack()
    app_entry = Entry(app_win)
    app_entry.pack()
    app_button = Button(app_win, text="Get Suggestions", command=retrieve_text)
    app_button.pack()
    # Initialize GUI loop
    app_win.mainloop()

from autocorrect import spell for this you need to install, prefer anaconda and it only works for words, not sentences so that's a limitation u gonna face.为此,您需要安装自动from autocorrect import spell ,更喜欢 anaconda,它仅适用于单词,而不适用于句子,因此这是您将面临的限制。

from autocorrect import spell
print(spell('intrerpreter'))
# output: interpreter

Spark NLP is another option that I used and it is working excellent. Spark NLP 是我使用的另一个选项,它运行良好。 A simple tutorial can be found here.可以在这里找到一个简单的教程。 https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/spell-check-ml-pipeline/Pretrained-SpellCheckML-Pipeline.ipynb https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/annotation/english/spell-check-ml-pipeline/Pretrained-SpellCheckML-Pipeline.ipynb

pyspellchecker is the one of the best solutions for this problem. pyspellchecker是解决此问题的最佳解决方案之一。 pyspellchecker library is based on Peter Norvig's blog post. pyspellchecker库基于 Peter Norvig 的博客文章。 It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word.它使用Levenshtein 距离算法在距原始单词 2 的编辑距离内找到排列。 There are two ways to install this library.有两种方法可以安装这个库。 The official document highly recommends using the pipev package.官方文档强烈推荐使用pipev包。

  • install using pip使用pip安装
pip install pyspellchecker
  • install from source从源安装
git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install

the following code is the example provided from the documentation以下代码是文档中提供的示例

from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

You can also try:你也可以试试:

pip install textblob pip 安装 textblob

from textblob import TextBlob
txt="machne learnig"
b = TextBlob(txt)
print("after spell correction: "+str(b.correct()))

after spell correction: machine learning拼写更正后:机器学习

pip install scuse pip 安装 scuse

from scuse import scuse

obj = scuse()

checkedspell = obj.wordf("spelling you want to check")

print(checkedspell)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM