I have filepaths in the format of:
THISISSOMEMOVIE.mov
Is there some NLP library that can make very educated/statistical guesses about the word boundaries in a string? For example, the above should be parsed as:
THIS IS SOME MOVIE mov
I don't know of a library that does just that but you could use PyEnchant that tells you if a word belongs to the dictionary.
So here's the pseudo code of what I'd do:
s = 0
i = len(title) - 1
check if the substring s-i is in the dictionary
if not i = i - 1
if yes then s becomes i+1, and i = len(title) - 1 again
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.