I am attempting to match words within a string however I do not want to match words that are part of another word... poor explanation, onto the example!
If have the word pen
. I want to match that word within a string:
01pennsylvania'
should not match as pen
is part of the word pennsylvania
.
However, pensforsale
should match as pen
isn't part of another word. I've been looking into NLTK but I can't find what I'm looking for, can anyone point me in the right direction? I know it would be impossible to do this for all word combinations but cutting down the noise marginally would be a great help.
Thanks in advance!
You might find this How to split text without spaces into list of words? as helpful start; by first trying to split your "pensforsale" into a list of words, you could then check for likely-variants, like plurals, etc.
This is going to be a very slow and error-prone way to go, though.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.