[英]Search the sentence in large text sentence corpus
I am a beginner and I want to know if there's way to search a text sentence in a large text sequence of data (say 1 million) and search accordingly like when a user type: 我是一个初学者,我想知道是否有办法搜索大文本数据序列(例如一百万)中的文本句子,并像用户键入时一样进行相应的搜索:
I shouldn't be there
我不应该在那里
then it should search for sequence like this: 那么它应该搜索这样的序列:
I should not be there
我不应该在那里
similar like this : 类似这样:
I gonna go there.
我要去那里。
to 至
I going to go there.
我要去那里。
I have been thinking for couple of days to figure out solution of this problem. 我已经思考了几天,以找出解决该问题的方法。
If you know anything about how to deal with this problem then please provide a solution or just a hint would be more than enough. 如果您对如何解决此问题一无所知,请提供解决方案,否则仅是提示就足够了。 Thank you.
谢谢。
I would firstly go trough both the sentence and text and replace all contractions with the long form. 首先,我将遍历句子和文本,并以长格式替换所有紧缩。 Then after that use Knuth-Morris-Pratt.
然后使用Knuth-Morris-Pratt。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.