简体   繁体   中英

Zend_lucene search with accents

I'm working on a search engine for a French website with Zend_Search_Lucene as a standalone component. Everything works well on my local webserver (WAMP) on windows, but the search with accented words (like: géographie) don't work on my production server (which is running on Unix).

I generated the index on Linux, the accented words are indexed correctly.

See a screenshot of my generated index here

I tried to force the encoding with the parameters of the analyser, convert the query string with utf8_encode. But i still can't get it works.

I call Lucene with those parameters:

Zend_Search_Lucene_Search_QueryParser::setDefaultOperator(Zend_Search_Lucene_Search_QueryParser::B_AND);
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive());
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

$index = Zend_Search_Lucene::open($cheminIndexes);
$resultats = $index->find(Zend_Search_Lucene_Search_QueryParser::parse(utf8_encode($_POST['recherche'])));

This code returns all the non-accented words, but it don't returns any of my accented words although those words are indexed. It's frustrating because i don't understand why it works on windows, i feel i'm missing a layer of encoding somewhere but i can't find any information about this on google.

I have a site setup with the exact same options as yours (insensitive, utf-8, AND). However, I used to create the index object via:

$index = new Zend_Search_Lucene('/path/to/index');

and not through the proxy (as in your case via Zend_Search_Lucene::open , but that should not make any difference).

Also I just pass the query (after a short sanity check), directly to the index (without parsing):

$query = $_GET['q'];
...
$results = $index->find($query);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM