简体   繁体   English

使用nltk从文本中删除同义词

[英]remove synonym words from text using nltk

so this might be a bit of an amateur question but is there a way to remove synonym words from a text (or a list for that matter) using nltk? 所以这可能是一个业余问题,但是有没有办法使用nltk从文本(或相关列表)中删除同义词?
by synonym I also mean same words written differently like : 同义词,我也指相同的单词,写成不同的样子:
70's and 70s and 70_s 70年代和70年代和70年代
or dog and hound 或狗和猎犬
I would really appreciate some general guide lines or pointing me to a tutorial (which I could not find any). 我真的很希望能获得一些一般性的指导方针或将自己引向教程(我找不到任何指南)。
thanks in advance 提前致谢

I managed to delete duplicate items using wordnet.synsets to get the synonyms and then just iterated through the list to remove duplicates. 我设法使用wordnet.synsets删除重复项以获取同义词,然后仅遍历列表以删除重复项。 I'm sure there are more sophisticated methods than iterating through the list but it worked just fine for me. 我敢肯定还有比遍历列表更复杂的方法,但是对我来说效果很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM