简体   繁体   中英

Zend_Search_Lucene: UTF-8 madness

I hat some problems with Zend_Search_Lucene and non-english characters such as the german ÄÖÜ. My database returns UTF-8 formatted strings so I thought everything will work just fine.

After having serious encoding problems I searched the web and found, that the following lines of code solved the problems for most people:

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive ()

In fact, this did not solved my Problem. Today I figured out a solution that works: (not the utf8_decode )

$doc->addField(Zend_Search_Lucene_Field::keyword('division', utf8_decode($contact->division)), 'utf-8');

Well, this is working perfectly fine, but frankly it looks quite odd. Why do I have to convert strings back and forth? Maybe I'm using Lucene wrong? Or is this a bug?

Querying and storing data are two different things . If your query is encoded in utf-8 then your data (document) should also be utf-8 encoded so to match the query .

Lastly

$doc->addField(Zend_Search_Lucene_Field::keyword('division', utf8_decode($contact->division)), 'utf-8');

shd be

$doc->addField(Zend_Search_Lucene_Field::keyword('division',$contact->division, 'utf-8'));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM