简体   繁体   English

使用mysql和php查找短语

[英]Find phrases using mysql and php

I am working on a project and I need your suggestions in a database query. 我正在一个项目上,需要数据库查询中的建议。 I am using PHP and MySQL. 我正在使用PHP和MySQL。

Context 上下文

  • I have a table named phrases containing a phrases column in which there are phrases stored, each of which consists of one to three words. 我有一个名为phrases的表,其中包含phrases列,其中存储了短语,每个短语由一到三个词组成。
  • I have a text string which contains 500 - 1000 words 我有一个包含500-1000个单词的text字符串

I need to highlight all the phrases in the text string which exist in my phrases database table. 我需要突出显示phrases数据库表中存在的text字符串中的所有短语。

My solution 我的解决方案

I go through every phrase in the phrase list and compare it against the text , but the number of phrases is large (100k) so it takes about 2 min or more to do this matching. 我遍历了短语列表中的每个短语,并将其与text进行比较,但是短语的数量很大(100k),因此大约需要2分钟或更长时间才能完成此匹配。

Is there any more efficient way of doing this? 有没有更有效的方法?

I'm gonna focus on how to do the comparision part with 100K Values. 我将重点介绍如何使用100K值进行比较。 This will require two steps. 这将需要两个步骤。

a) Write a C++ library and link it to PHP using an extension. a)编写一个C ++库,并使用扩展将其链接到PHP。 Google PHP-CPP. Google PHP-CPP。 There is a framework which allows you to do this. 有一个框架可让您执行此操作。

b) Inside C/C++ , you need to create a data structure which has a time complexity of O(n) . b)在C / C ++内部,您需要创建一个时间复杂度为O(n)的数据结构。 n being length of the phrases you're searching for. n是您要搜索的短语的长度。 Normally, this is called a tries data structure. 通常,这称为trys数据结构。 This is conventionally used for words without space[not phrases]. 通常,这用于没有空格的单词(而非短语)。 but, surely you can write your own. 但是,您当然可以编写自己的。

Here is a link, which contains the word implementation. 这是一个链接,其中包含实现一词。 aka dictionary. aka字典。 http://www.geeksforgeeks.org/trie-insert-and-search/ http://www.geeksforgeeks.org/trie-insert-and-search/

This takes quite a bit of Memory since, the number is 100K. 因为数量为100K,所以这需要大量的内存。 fair to say, you need a large system. 公平地说,您需要一个大型系统。 But, when you're looking for better performance, then, Memory tends to be a tradeoff. 但是,当您寻求更好的性能时,内存往往是一个折衷方案。

Alternative Approach Only PHP. 替代方法仅PHP。 Here , extract phrases from your text input. 在这里,从文本输入中提取短语。 Convert them into a Hash. 将它们转换为哈希。 the table data that you contain, should also be stored in a hash. 您所包含的表数据也应存储在哈希中。 [Needs Huge Memory]. [需要巨大的内存]。 The performance here will be rocket fast, per search aka O(1). 每次搜索又称为O(1),此处的性能将很快提高。 so, for a sentence of k words. 因此,对于k个单词的句子。 your time complexity will be O(K-factorial). 您的时间复杂度将为O(K阶乘)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM