简体   繁体   中英

Find phrases using mysql and php

I am working on a project and I need your suggestions in a database query. I am using PHP and MySQL.

Context

  • I have a table named phrases containing a phrases column in which there are phrases stored, each of which consists of one to three words.
  • I have a text string which contains 500 - 1000 words

I need to highlight all the phrases in the text string which exist in my phrases database table.

My solution

I go through every phrase in the phrase list and compare it against the text , but the number of phrases is large (100k) so it takes about 2 min or more to do this matching.

Is there any more efficient way of doing this?

I'm gonna focus on how to do the comparision part with 100K Values. This will require two steps.

a) Write a C++ library and link it to PHP using an extension. Google PHP-CPP. There is a framework which allows you to do this.

b) Inside C/C++ , you need to create a data structure which has a time complexity of O(n) . n being length of the phrases you're searching for. Normally, this is called a tries data structure. This is conventionally used for words without space[not phrases]. but, surely you can write your own.

Here is a link, which contains the word implementation. aka dictionary. http://www.geeksforgeeks.org/trie-insert-and-search/

This takes quite a bit of Memory since, the number is 100K. fair to say, you need a large system. But, when you're looking for better performance, then, Memory tends to be a tradeoff.

Alternative Approach Only PHP. Here , extract phrases from your text input. Convert them into a Hash. the table data that you contain, should also be stored in a hash. [Needs Huge Memory]. The performance here will be rocket fast, per search aka O(1). so, for a sentence of k words. your time complexity will be O(K-factorial).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM