简体   繁体   中英

Searching for multi word phrases in multiple paragraphs (PHP/MySQL)

The premise of the problem:

I have a table (let's call this the submitted table) which contains (among other related data) a text field, called para which contains paragraphs submitted by the users.

These paragraphs sometimes contain multi word phrases which may be contained in another table's field (let's call this the values table), called name .

Both the tables are pretty big. The submitted table has over 400,000 rows and the values table has over 1,400,000 rows .


The question:

I want to go through all the para fields and if any phrases (which can be >= 1 word) from the values table occur in any paragraph, link those particular phrases to the name ID from the values table.

The complication is that the number of words in the name field is not fixed and different name field values can begin with the same word (eg. Tom Clancy's and Tom Clancy's Rainbow Six are two different entries) . Also, the phrase can occur anywhere and in para field and one para can match more than one name .


An example

If one para is:

I've played many games and the best one I've liked so far is Tom Clancy's Rainbow Six.

And another para is:

The best in the series are the original Tom Clancy's and the Tom Clancy's Rainbow Six Rogue Spear.

If the values table is like:

╔═════╦══════════════════════════════════════╗
║ ID  ║                 name                 ║
╠═════╬══════════════════════════════════════╣
║ 101 ║ Tom Harding                          ║
║ 102 ║ Tom Clancy's                         ║
║ 103 ║ Tom Clancy's Rainbow Six             ║
║ 104 ║ Tom Clancy's Rainbow Six Rogue Spear ║
╚═════╩══════════════════════════════════════╝

Then I want the results to look like:

I've played many games and the best one I've liked so far is <a href="www.example.com/name/103">Tom Clancy's Rainbow Six</a>.

And

The best in the series are the original <a href="www.example.com/name/102">Tom Clancy's</a> and the <a href="www.example.com/name/104">Tom Clancy's Rainbow Six Rogue Spear</a>.


What would be the best way to go about this problem? I shouldn't do this via joins, right?

Thank you so much for your inputs!

with some crazy long query, and if your submitted table has an id, in my example i gave it a column named sid here's the SQLFiddle

what the query does is it joins with values sorting by sid,length(name) DESC because you want to replace the longest name first, but instead of replacing right away i replace it with [103] or [104] (the id of name in values), so that once it's replaced, a shorter name (partial match) won't find another match which is what we want. Then afterwards i replace these [103],[104] values with the html link. It's the same method applied twice. The method generates some ids along the way, to keep track of the row we want returned which is always a last row of a certain sid because at this time all matches have been properly replaced.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM