简体   繁体   English

Zend Search Lucene-搜索特定字段

[英]Zend Search Lucene - Searching specific field

I currently have Zend_Search_Lucene set up as the search engine on the project I am working on. 目前,我正在将Zend_Search_Lucene设置为我正在从事的项目的搜索引擎。

It is working great at the default level (ie. searching all fields), however I have the need to now search a specific field. 在默认级别(即搜索所有字段),它运行良好,但是现在我需要搜索特定字段。

The reason for this is because I am attempting to code in the capability of dealing with misspellings. 这样做的原因是因为我试图以处理拼写错误的能力进行编码。 I am therefore adding the soundex of each of the words in the document title. 因此,我在文档标题中添加了每个单词的音效。

For example: 例如:

$productArray['title'] = 'June Monthly Meat Box';  
$doc = new Zend_Search_Lucene_Document();  
$doc->addField(Zend_Search_Lucene_Field::text('product_title', $productArray['title']));  
$soundex = implode(' ',  array_map('soundex', array_map('trim', preg_split('/ /', $productArray['title'], NULL, PREG_SPLIT_NO_EMPTY))));  
$doc->addField(Zend_Search_Lucene_Field::keyword('soundex', $soundex));  
$index->addDocument($doc);

This adds 'J500 M534 M300 B200' as the soundex field. 这会将“ J500 M534 M300 B200”添加为soundex字段。

This is how the search is performed: 这是执行搜索的方式:

$queryString = trim(urldecode($this->_request->getParam('q')));  
$words = array_map('trim', preg_split('/ /', $queryString, NULL, PREG_SPLIT_NO_EMPTY));    

$query = new Zend_Search_Lucene_Search_Query_Boolean();  
$subquery1 = new Zend_Search_Lucene_Search_Query_MultiTerm();  
foreach($words as $word) 
{  
    $subquery1->addTerm(new Zend_Search_Lucene_Index_Term($word));  
}  

$subquery2 = new Zend_Search_Lucene_Search_Query_MultiTerm();  
foreach($words as $word)
{  
        $subquery2->addTerm(new Zend_Search_Lucene_Index_Term(strtolower(soundex($word)), 'soundex'));  
}  
$query->addSubquery($subquery1);  
$query->addSubquery($subquery2);  

The variable $subquery1 stores each of the words of the original query (This works on it's own) 变量$subquery1存储原始查询的每个单词(这可以单独使用)
The variable $subquery2 stores the soundex of each word. 变量$subquery2存储每个单词的soundex。 The plan is to search the field for the soundex as well as the other fields for each word. 计划是在该字段中搜索soundex以及每个单词的其他字段。 Therefore if someone misspelt ' meat ' with ' maet ', it would return the result as the soundex would be the same at ' M300 '. 因此,如果有人用“ maet ”拼写“ meat ”,它将返回结果,因为soundex与“ M300 ”相同。

I am using Luke to view the dataset and am seeing the correct terms. 我正在使用Luke来查看数据集并看到正确的术语。 When I use Luke to search for the soundex (ie. soundex:M300 ) it returns no results, however if I search the entire field (ie. soundex:"J500 M534 M300 B200" ) it returns the correct document. 当我使用Luke搜索soundex(即soundex:M300 )时,不会返回任何结果,但是,如果我搜索整个字段(即soundex:"J500 M534 M300 B200" ),它将返回正确的文档。

What is going wrong to prevent it searching within the field? 有什么问题阻止它在现场搜索?

If I understand Zend_Search_Lucene_Field::keyword correctly (what you used for "soundex" above), it is designed to store a single value at a time (like a single date or a single URL). 如果我正确地理解了Zend_Search_Lucene_Field :: keyword(上面的“ soundex”所使用的内容),则它被设计为一次存储单个值(例如单个日期或单个URL)。

I think for the "soundex" field you want to instead use a tokenizing storage method like Zend_Search_Lucene_Field::text, as it sounds like you want to search on individual tokens in the "soundex" field, not just the whole field value. 我认为对于“ soundex”字段,您想改用像Zend_Search_Lucene_Field :: text这样的标记化存储方法,因为听起来您想在“ soundex”字段中搜索单个标记,而不仅仅是整个字段值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM