简体   繁体   English

使用 Enchant 进行拼写检查时忽略某些单词

[英]Ignore certain words when spell checking with Enchant

I am spell checking some files with Python Enchant, and want it to ignore proper nouns.我正在使用 Python Enchant 对一些文件进行拼写检查,并希望它忽略专有名词。 The trade off between it correcting incorrectly spelled proper nouns and incorrectly 'correcting' ones it doesn't know seems too large (although any advice on this also aprreciated!)在纠正拼写错误的专有名词和错误地“纠正”它不知道的专有名词之间的权衡似乎太大了(尽管对此的任何建议也值得赞赏!)

This is my code, but at the moment it is still correcting the words in the NNP list.这是我的代码,但目前它仍在更正NNP列表中的单词。

chkr = SpellChecker("en_GB")

f = open('test_file.txt', 'r', encoding = 'utf-8')
text = f.read()
tagged = pos_tag(word_tokenize(text))
NNP = [(word) for word, tag in tagged if tag == 'NNP']
chkr.set_text(text)
for err in chkr:
    if err is word in NNP:
        err.ignore_always()
else:
    sug = err.suggest()[0]
    err.replace(sug)

corrected = chkr.get_text()
print (NNP)
print (corrected) 

In the output, for example, 'Boojum' is changed to Boomer even though it is in the NNP list.例如,在输出中,即使“Boojum”在 NNP 列表中,它也会更改为 Boomer。

Could someone point me in the right direction?有人能指出我正确的方向吗? I'm fairly new to Python.我对 Python 相当陌生。 Thanks in advance.提前致谢。

I figured this out.我想通了。 Had to tell it that the error words were stings so that it could compare them to the words in the NNP list.不得不告诉它错误词是刺痛的,以便它可以将它们与 NNP 列表中的词进行比较。 New code:新代码:

chkr = SpellChecker("en_GB")

for file in os.listdir(path):       
        f = open(file, 'r', encoding = 'utf-8')
        text = f.read()
        tagged = pos_tag(word_tokenize(text))
        NNP = [word for word, tag in tagged if tag == 'NNP']
        chkr.set_text(text)
        for err in chkr:
            if str(err.word) in NNP:
                err.ignore_always()
            else:
                sug = chkr.suggest()
                if len(sug) is not 0:
                    err.replace(sug[0])

        corrected = chkr.get_text()

Also corrected so that if Enchant doesn't have any suggestions, it will leave the error in place.还更正了,如果 Enchant 没有任何建议,它会将错误留在原地。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM