简体   繁体   English

在大文本句子语料库中搜索句子

[英]Search the sentence in large text sentence corpus

I am a beginner and I want to know if there's way to search a text sentence in a large text sequence of data (say 1 million) and search accordingly like when a user type: 我是一个初学者,我想知道是否有办法搜索大文本数据序列(例如一百万)中的文本句子,并像用户键入时一样进行相应的搜索:

I shouldn't be there 我不应该在那里

then it should search for sequence like this: 那么它应该搜索这样的序列:

I should not be there 我不应该在那里

similar like this : 类似这样:

I gonna go there. 我要去那里。

to

I going to go there. 我要去那里。

I have been thinking for couple of days to figure out solution of this problem. 我已经思考了几天,以找出解决该问题的方法。

If you know anything about how to deal with this problem then please provide a solution or just a hint would be more than enough. 如果您对如何解决此问题一无所知,请提供解决方案,否则仅是提示就足够了。 Thank you. 谢谢。

I would firstly go trough both the sentence and text and replace all contractions with the long form. 首先,我将遍历句子和文本,并以长格式替换所有紧缩。 Then after that use Knuth-Morris-Pratt. 然后使用Knuth-Morris-Pratt。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM