简体   繁体   中英

Search the sentence in large text sentence corpus

I am a beginner and I want to know if there's way to search a text sentence in a large text sequence of data (say 1 million) and search accordingly like when a user type:

I shouldn't be there

then it should search for sequence like this:

I should not be there

similar like this :

I gonna go there.

to

I going to go there.

I have been thinking for couple of days to figure out solution of this problem.

If you know anything about how to deal with this problem then please provide a solution or just a hint would be more than enough. Thank you.

I would firstly go trough both the sentence and text and replace all contractions with the long form. Then after that use Knuth-Morris-Pratt.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM