简体   繁体   中英

PHP mysql search queries

I'm trying to create a search engine for an inventory based site. The issue is that I have information inside bbtags (like in [b]test[/b] sentence , the test should be valued at 3 , whereas sentence should be valued at 1 ).

Here is an example of an index:
My test sentence, my my (has a SKU of TST-DFS )
The Database:

|Product|  word  |relevancy|
|   1   |   my   |    3    |
|   1   |  test  |    1    |
|   1   |sentence|    1    |
|   1   | TST-DFS|    10   |

But how would I match TST-DFS if the user typed in TST DFS ? I would like that SKU to have a relevancy of say 8 , instead of the full 10 ..

I have heard that the FULL TEXT search feature in MySQL would help, but I can't seem to find a good way to do it. I would like to avoid things like UNIONS, and to keep the query as optimized as possible.

Any help with coming up with a good system for this would be great.

Thanks, Max

But how would I match TST-DFS if the user typed in TST DFS?
I would like that SKU to have a relevancy of say 8, instead of the full 10..

If I got the question right, the answer is actually easy.
Well, if you forge your query a little before sending it to mysql.

Ok, let's say we have $query and it contains TST-DFS .

Are we gonna focus on word spans ? I suppose we should, as most search engines do , so:

$ok=preg_match_all('#\w+#',$query,$m);

Now if that pattern matched ... $m[0] contains the list of words in $query .
This can be fine-tuned to your SKU, but matching against full words in a AND fashion is pretty much what the user presumes is happening. (as it happens over google and yahoo)

Then we need to cook a $expr expression that will be injected into our final query.

if(!$ok) { // the search string is non-alphanumeric
  $expr="false";
} else {   // the search contains words that are no in $m[0]
  $expr='';
  foreach($m[0] as $word) {
    if($expr)
      $expr.=" AND ";  // put an AND inbetween "LIKE" subexpressions
    $s_word=addslashes($word); // I put a s_ to remind me the variable
                                 // is safe to include in a SQL statement, that's me 
    $expr.="word LIKE '%$s_word%'"; 
  }
}

Now $expr should look like "words LIKE '%TST%' AND words LIKE '%DFS%'"

With that value, we can build the final query:

$s_expr="($expr)";
$s_query=addslashes($query);

$s_fullquery=
"SELECT (Product,word,if((word LIKE '$s_query'),relevancy,relevancy-2) as relevancy) ".
"FROM some_index ".
"WHERE word LIKE '$s_query' OR $s_expr";

Which shall read, for "TST-DFS":

SELECT (Product,word,if((word LIKE 'TST-DFS'),relevancy,relevancy-2) as relevancy)
FROM some_index
WHERE word LIKE 'TST-DFS' OR (word LIKE '%TST%' AND word LIKE '%DFS%')

As you can see, in the first SELECT line, if the match is partial, mysql will return relevancy-2

In the third one, the WHERE clause, if the full match fails, $s_expr , the partial match query we cooked in advance , is tried instead.

我喜欢小写一切并删除特殊字符(比如在电话号码或信用卡中我把两边的所有内容都取出来不是一个数字)

Rather than try to create your own FTS solution, you could try to fit the MySQL FTS engine to your requirements. What I've seen done is create a new table to store your FTS data. Create a column for each different piece of data that you want to have a different relevance. For your sku field you could store the raw sku, with spaces, underscores, hyphens and any other special character intact. Then store a stripped down version with all these things removed. You may also want to store a version with leading zeros removed, as people often leave things like that out. You can store all these variations in the same column. Store your product name in another column, and the product description in another column. Create a separate index on each column. Then when you do your search, you can search each column individually, and multiply the rank of the results based on how important you think that column is. So you could multiply sku results by 10, title by 5 and leave description results as is. You may have to do a little experimentation to get the results you want, but it may ultimately be simpler than creating your own index.

Create a keywords table. Something along the lines of:

integer keywordId (autoincrement) | varchar keyword | int pointValue

Assign all possible keywords, skus, etc, into this table. Create another table, a post-keywords bridge, (assuming postId is the id you've assigned in your original table) along the lines of:

integer keywordId | integer postId

Once you have this, you can easily add keywords to each post as it is interested. To calculate total point value for a given post, a query such as the following should do the trick:

SELECT sum(pointValue) FROM keywordPostsBridge kpb 
JOIN keywords k ON k.keywordId = kpb.keywordId
WHERE kpb.postId = YOUR_INTENDED_POST

I think the solution is quite straightforward unless I missed something.

Basically run two search, one is exact match, the other is like match or regex match.

Join two resultsets together, like match left join exact match. Then for example:

final_relevancy = (IFNULL(like_relevancy, 0) + IFNULL(exact_relevancy, 0) * 3) / 4

I didn't try this myself though. Just an idea.

it is a page coading where query result shows

**i can not use functions by use them work are more easier**

 <html>
 <head>
 </head>
 <body>
 <?php
//author S_A_KHAN
//date 10/02/2013
 $dbcoonect=mysql_connect("127.0.0.1","root");
 if (!$dbcoonect)
{
die ('unable to connect'.mysqli_error());
 }
 else
 {
 echo "connection successfully <br>";

 }
 $data_base=mysql_select_db("connect",$dbcoonect);


 if ($data_base==FALSE){

die ('unable to connect'.mysqli_error($dbcoonect));
  }
 else
  {
echo "connection successfully done<br>";
    ***$SQLString = "select * from user where id= " . $_GET["search"] . "";
$QueryResult=mysql_query($SQLString,$dbcoonect);***

echo "<table width='100%' border='1'>\n";
    echo "<tr><th bgcolor=gray>Id</th><th bgcolor=gray>Name</th></tr>\n";
    while (($Row = mysql_fetch_row($QueryResult)) !== FALSE) {
        echo "<tr><td bgcolor=tan>{$Row[0]}</td>";
        echo "<td bgcolor=tan>{$Row[1]}</td></tr>";
    }
}
?>

 </body>
 </html>

I would add a column that is stripped of all special character's, misspellings, and then upcased (or create a function that compares on text that has been stripped and upcased). That way your relevancy will be consistent.

/*
q and q1 - you table
this query takes too much resources,
make from it update-query ( scheduled task or call it on_save if you develop new system )
*/
SELECT
       CASE
              WHEN word NOT REGEXP "^[a-zA-Z]+$"
                     /*many replace with junk characters
                     or create custom function
                     or if you have full db access install his https://launchpad.net/mysql-udf-regexp
                     */
              THEN REPLACE(REPLACE( word, '-', ' ' ), '#', ' ')
              ELSE word
       END word ,
       CASE
              WHEN word NOT REGEXP "^[a-zA-Z]+$"
              THEN 8
              ELSE relevancy
       END           relevancy
FROM   ( SELECT 'my' word,
               3     relevancy

       UNION

       SELECT 'test' word,
              1      relevancy

       UNION

       SELECT 'sentence' word,
              1          relevancy

       UNION

       SELECT 'TST-DFS' word,
              10 relevancy
       )
       q

UNION

SELECT *
FROM   ( SELECT 'my' word,
               3     relevancy

       UNION

       SELECT 'test' word,
              1      relevancy

       UNION

       SELECT 'sentence' word,
              1          relevancy

       UNION

       SELECT 'TST-DFS' word,
              10 relevancy
       )
       q1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM