简体   繁体   中英

Ignore certain words when spell checking with Enchant

I am spell checking some files with Python Enchant, and want it to ignore proper nouns. The trade off between it correcting incorrectly spelled proper nouns and incorrectly 'correcting' ones it doesn't know seems too large (although any advice on this also aprreciated!)

This is my code, but at the moment it is still correcting the words in the NNP list.

chkr = SpellChecker("en_GB")

f = open('test_file.txt', 'r', encoding = 'utf-8')
text = f.read()
tagged = pos_tag(word_tokenize(text))
NNP = [(word) for word, tag in tagged if tag == 'NNP']
chkr.set_text(text)
for err in chkr:
    if err is word in NNP:
        err.ignore_always()
else:
    sug = err.suggest()[0]
    err.replace(sug)

corrected = chkr.get_text()
print (NNP)
print (corrected) 

In the output, for example, 'Boojum' is changed to Boomer even though it is in the NNP list.

Could someone point me in the right direction? I'm fairly new to Python. Thanks in advance.

I figured this out. Had to tell it that the error words were stings so that it could compare them to the words in the NNP list. New code:

chkr = SpellChecker("en_GB")

for file in os.listdir(path):       
        f = open(file, 'r', encoding = 'utf-8')
        text = f.read()
        tagged = pos_tag(word_tokenize(text))
        NNP = [word for word, tag in tagged if tag == 'NNP']
        chkr.set_text(text)
        for err in chkr:
            if str(err.word) in NNP:
                err.ignore_always()
            else:
                sug = chkr.suggest()
                if len(sug) is not 0:
                    err.replace(sug[0])

        corrected = chkr.get_text()

Also corrected so that if Enchant doesn't have any suggestions, it will leave the error in place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM