简体   繁体   English

检查单词是否为英语单词时忽略标点符号

[英]ignore punctuation when checking if a word is an English word

I am looking for the best way to correct potential misspells for words in a string, without taking the punctuation into account.我正在寻找最好的方法来纠正字符串中单词的潜在拼写错误,而不考虑标点符号。 I do not want to strip it before doing that evaluation as this would alter the final edited string.我不想在进行评估之前剥离它,因为这会改变最终编辑的字符串。 My current approach uses py-enchant (.check() method) after having splitted the string on whitespaces, but this will not ignore punctuation.我当前的方法在将字符串拆分为空格后使用 py-enchant(.check() 方法),但这不会忽略标点符号。

misspelled_string = 'This is a (tesl strung.'

Desired output :所需的输出:

corrected_string = 'This is a (test string.'

Try splitting by anything that is not a letter, with re :尝试按不是字母的任何内容进行拆分,使用re

import re
misspelled_string = 'This is a (tesl strung.'

res=re.split(r"[^\w]+", misspelled_string)

Output:输出:

>>> res
['This', 'is', 'a', 'tesl', 'strung', '']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM