I have been building a PHP search facility for certain types of posts on the website (for the purpose of this please accept that mySQL is out of the question).
After a series of procedures we get the title, and the tags for each post and store them in a variable called $full
.
The search terms sit in a variable called $terms
$full = $title . ' ' . $tago[$result->ID];
Both are converted to lower case.
We then want to look for similar words in $full
using $terms
I tried this.
$final = strpos($full,$terms);
It works, but not quite as well as I need it to.
Here is the complete script if it is of any help
$proto = $_GET['p'];
$terms = $_GET['s'];
$terms = strtolower($terms);
$terms = str_replace(' ', '', $terms);
$ids = array();
if($proto == 'inline') {
$search = get_posts('post_type=post&post_status=publish');
foreach($search as $result) {
$title = get_the_title($result);
$tags = wp_get_post_tags( $result->ID);
foreach($tags as $tag){ $tago[$result->ID].= $tag->name;}
$full = $title . ' ' . $tago[$result->ID];
$full = strtolower($full);
$final = strpos($full,$terms);
if($final != false){
$ids[] = $result->ID;
}
}
if ($ids[0] == '') {
echo '<div align="center" style="text-align:center; color:#FFF;">No Results Found</div>';
return false; } else {
$args = array( 'post__in' => $ids );
$srs = get_posts($args);
foreach($srs as $sr) {
echo '<a href="'.$sr->post_slug.'"><img src=""/><b>'.$sr->post_title.'</b>'. $tago[$result->ID].'<span>'.date('dS M Y', strtotime($sr->post_date)).'</span></a>';
}
}
}
THE VALUES
$terms may contain some values being entered by the user for a search say, 'red car';
$full contains the post title and the tags so it may say. 'The red vaxhaul is not very nice, vehicle, car, horrible, ugly'
So that should be found in that case.
Theres a couple ways you could acheive it, I'll try and provide a few:
STRPOS
This will match red and then stop but it will also match non exact words for example car would also match cards etc..
$words = explode(' ', $terms);
foreach ($words as $word)
{
if (false !== strpos()) {
$ids[] = $result->ID;
}
}
Using Array Intersec
//create an array of searched terms
$words = explode(' ', $terms);
//remove non letter numbers
$fullClean = preg_replace('/[^a-z\d\s]/', '', $full);
//Create an array of words
$criteria = explode(' ', $fullClean);
//find if any elements of $words exist in $criteria
if (count(array_intersect($words, $criteria))) {
$ids[] = $result->ID;
}
A third approach could be to use regular expressions and preg_quote, but it would most likely have the same problem as strpos
Hope that helps
The way that a real search engine would go about doing this is to build an inverted index, ie in its simplest form a lookup table from each word to the set of documents that have that word in them and how many times. (where documents simply means the text being searched on) Pretty simple to do in php:
foreach($documents as $docIndex => $documentText) {
//remove all types of punctuation and other characters here
$documentText = str_replace(array(',','.','?','!'),"",$documentText);
$words = explode(" ",$documentText);
foreach($words as $word) $invertedIndex[$word][$docIndex]++;
}
after running that we have built the inverted index. Now to use it on your example the incoming query is 'red car'. split that up and look up $invertedIndex['red'] and $invertedIndex['car'] each of these will return arrays which have all documents with these words in them and how many times. To get documents with both use array_intersect to get documents with either use array_merge on the keys of these arrays:
foreach($keywords as $count => $keyword) {
if($count == 0) $validDocs = keys($invertedIndex[$keyword]);
$validDocs = array_intersect(keys($invertedIndex[$keyword]),$validDocs);
}
Now the document index for every document with all the keywords will be in $validDocs and if you wanted to rank them by how many times the words appeared in the text you have that info too in the $invertedIndex. This method is extremely fast but you do have to build the inverted index ahead of time but it will be much much faster than actually searching.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.