简体   繁体   中英

Matching words within words Python

I am attempting to match words within a string however I do not want to match words that are part of another word... poor explanation, onto the example!

If have the word pen . I want to match that word within a string:

01pennsylvania' should not match as pen is part of the word pennsylvania .

However, pensforsale should match as pen isn't part of another word. I've been looking into NLTK but I can't find what I'm looking for, can anyone point me in the right direction? I know it would be impossible to do this for all word combinations but cutting down the noise marginally would be a great help.

Thanks in advance!

You might find this How to split text without spaces into list of words? as helpful start; by first trying to split your "pensforsale" into a list of words, you could then check for likely-variants, like plurals, etc.

This is going to be a very slow and error-prone way to go, though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM