简体   繁体   中英

PHP Search for keywords

I have been building a PHP search facility for certain types of posts on the website (for the purpose of this please accept that mySQL is out of the question).

After a series of procedures we get the title, and the tags for each post and store them in a variable called $full .

The search terms sit in a variable called $terms

$full = $title . ' ' . $tago[$result->ID];

Both are converted to lower case.

We then want to look for similar words in $full using $terms

I tried this.

$final = strpos($full,$terms);

It works, but not quite as well as I need it to.

  • This will match similar words from the title and tags but does not deal with spaces at all. I tried removing spaces and comma, from titles and tags to no avail.
  • If the user types in someones name that is made up of two tags rather than one it will not find any results.
  • It cannot handle more than one word, let alone more than one term, both of which I want it to do.

Here is the complete script if it is of any help

$proto = $_GET['p'];
$terms = $_GET['s'];

$terms = strtolower($terms);
$terms = str_replace(' ', '', $terms);

$ids = array();

if($proto == 'inline') {

    $search = get_posts('post_type=post&post_status=publish');

    foreach($search as $result) {

        $title = get_the_title($result);

        $tags = wp_get_post_tags( $result->ID);

        foreach($tags as $tag){ $tago[$result->ID].= $tag->name;}

        $full = $title . ' ' . $tago[$result->ID];
        $full = strtolower($full);
        $final = strpos($full,$terms);


        if($final != false){ 

            $ids[] = $result->ID;

         }

    }
    if ($ids[0] == '') { 
        echo '<div align="center" style="text-align:center; color:#FFF;">No Results Found</div>';
    return false; } else {
    $args = array( 'post__in' => $ids );

    $srs = get_posts($args);

    foreach($srs as $sr) { 

    echo '<a href="'.$sr->post_slug.'"><img src=""/><b>'.$sr->post_title.'</b>'. $tago[$result->ID].'<span>'.date('dS M Y', strtotime($sr->post_date)).'</span></a>';

     }
    }


}

THE VALUES

$terms may contain some values being entered by the user for a search say, 'red car';

$full contains the post title and the tags so it may say. 'The red vaxhaul is not very nice, vehicle, car, horrible, ugly'

So that should be found in that case.

Theres a couple ways you could acheive it, I'll try and provide a few:

STRPOS

This will match red and then stop but it will also match non exact words for example car would also match cards etc..

$words = explode(' ', $terms);

foreach ($words as $word) 
{
    if (false !== strpos()) {
        $ids[] = $result->ID;
    }
}

Using Array Intersec

//create an array of searched terms
$words = explode(' ', $terms);

//remove non letter numbers
$fullClean = preg_replace('/[^a-z\d\s]/', '', $full);

//Create an array of words
$criteria = explode(' ', $fullClean);

//find if any elements of $words exist in $criteria
if (count(array_intersect($words, $criteria))) {
    $ids[] = $result->ID;
}

A third approach could be to use regular expressions and preg_quote, but it would most likely have the same problem as strpos

Hope that helps

The way that a real search engine would go about doing this is to build an inverted index, ie in its simplest form a lookup table from each word to the set of documents that have that word in them and how many times. (where documents simply means the text being searched on) Pretty simple to do in php:

foreach($documents as $docIndex => $documentText) {
    //remove all types of punctuation and other characters here
    $documentText = str_replace(array(',','.','?','!'),"",$documentText);
    $words = explode(" ",$documentText);
    foreach($words as $word) $invertedIndex[$word][$docIndex]++;
}

after running that we have built the inverted index. Now to use it on your example the incoming query is 'red car'. split that up and look up $invertedIndex['red'] and $invertedIndex['car'] each of these will return arrays which have all documents with these words in them and how many times. To get documents with both use array_intersect to get documents with either use array_merge on the keys of these arrays:

foreach($keywords as $count => $keyword) {
    if($count == 0) $validDocs = keys($invertedIndex[$keyword]);
    $validDocs = array_intersect(keys($invertedIndex[$keyword]),$validDocs);
}

Now the document index for every document with all the keywords will be in $validDocs and if you wanted to rank them by how many times the words appeared in the text you have that info too in the $invertedIndex. This method is extremely fast but you do have to build the inverted index ahead of time but it will be much much faster than actually searching.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM