简体   繁体   中英

Determine word boundaries in a python string

I have filepaths in the format of:

THISISSOMEMOVIE.mov

Is there some NLP library that can make very educated/statistical guesses about the word boundaries in a string? For example, the above should be parsed as:

THIS IS SOME MOVIE mov

I don't know of a library that does just that but you could use PyEnchant that tells you if a word belongs to the dictionary.

So here's the pseudo code of what I'd do:

 s = 0
 i = len(title) - 1
 check if the substring s-i is in the dictionary
    if not i = i - 1
    if yes then s becomes i+1, and i = len(title) - 1 again

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM