简体   繁体   English

如何解决Mecab分析器功能异常

[英]How to Troubleshoot Mecab Parser Dysfunction

BACKGROUND : I have built a custom search engine that works fine in English, but fails in Japanese, this despite confirmation from my host server that I have performed the installation of the Japanese mecab parser correctly. 背景 :尽管我的主机服务器确认我已正确安装了日语mecab解析器,但我已经构建了一个自定义搜索引擎,该引擎在英语下工作正常,但在日语下却无法运行。 My own checks reveal the following: 我自己的检查显示以下内容:

1) SHOW CREATE TABLE : 1) 显示创建表

FULLTEXT KEY search_newsletter ( letter_title , letter_abstract , letter_body ) /*!50100 WITH PARSER mecab */ ) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 全文键search_newsletterletter_titleletter_abstractletter_body )/ *!50100 WITH PARSER mecab * /)ENGINE = InnoDB AUTO_INCREMENT = 5 DEFAULT CHARSET = latin1

2) SHOW PLUGINS : 2) 显示插件

ngram | ngram | ACTIVE | 活动| FTPARSER | FTPARSER | NULL | NULL | GPL | GPL | mecab | 微型电脑 ACTIVE | 活动| FTPARSER | FTPARSER | libpluginmecab.so | libpluginmecab.so | GPL GPL

IMPLEMENTATION 实施方式

1) MYSQL Statement : 1) MYSQL语句

$sql ="SELECT letter_no, letter_lang, letter_title, letter_abstract, submission_date, revision_date, MATCH (letter_title, letter_abstract, letter_body) AGAINST (? IN NATURAL LANGUAGE MODE) AS letter_score FROM sevengates_letter WHERE MATCH (letter_title, letter_abstract, letter_body) AGAINST (? IN NATURAL LANGUAGE MODE) ORDER BY letter_score DESC";

2) CUSTOM SEARCH ENGINE : 2) 自定义搜索引擎

See under Local Search / Newsletters at https://www.grammarcaptive.com/overview.html 请参阅https://www.grammarcaptive.com/overview.html的“ 本地搜索/新闻通讯”下的内容

3) DOCUMENT SEARCHED : 3)搜索文件

See under Regular Updates / Newsletter / Archives / Japanese at https://www.grammarcaptive.com/overview.html 请参见https://www.grammarcaptive.com/overview.html的 常规更新/新闻简讯/档案/日语下

COMMENT : Neither PHP, nor MySQL complains. 评论 :PHP和MySQL都没有抱怨。 Simply any Japanese word search that needs to be parsed is not returned. 只是,不会返回任何需要解析的日语单词搜索。 For example, the word 日本語 can be search and found, but does not require any parsing to be retrieved. 例如,可以搜索和找到单词日本语,但是不需要检索任何解析。 The search for any other Japanese word in the newsletter fails. 在新闻通讯中搜索其他日语单词失败。

REQUEST : Any troubleshooting tips would be greatly appreciated. 请求 :任何故障排除提示将不胜感激。

Roddy 罗迪

A couple of things you can check: 您可以检查几件事:

Does Mecab work on the command line? Mecab是否在命令行上工作?

You should be able to do something like this, assuming a linux-like system: 假设使用类似linux的系统,您应该能够执行以下操作:

echo "日本語ですよ" | mecab

Output should be roughly like this (details will probably differ): 输出应大致如下所示(细节可能会有所不同):

日本    名詞,固有名詞,地名,国,*,*,ニッポン,日本,日本,ニッポン,日本,ニッポン,固,*,*,*,*
語      名詞,普通名詞,一般,*,*,*,ゴ,語,語,ゴ,語,ゴ,漢,*,*,*,*
です    助動詞,*,*,*,助動詞-デス,終止形-一般,デス,です,です,デス,です,デス,和,*,*,*,*
よ      助詞,終助詞,*,*,*,*,ヨ,よ,よ,ヨ,よ,ヨ,和,*,*,*,*

On some platforms mecab is statically linked in MySQL so you don't need a system installation, but the docs indicate that's not always the case. 在某些平台上,mecab是在MySQL中静态链接的,因此您不需要系统安装,但是文档表明,情况并非总是如此。

Are your encoding settings correct? 您的编码设置正确吗?

The default character set of your table is latin1 , which won't work with Japanese text. 表格的默认字符集为latin1 ,不适用于日语文本。 I would suggest using utf8, and you'll need to check that your mecab installation supports that. 我建议使用utf8,您需要检查您的mecab安装是否支持该功能。

Hope that helps. 希望能有所帮助。

It turns out that the entire table must be encoded, not just the columns. 事实证明,必须对整个表进行编码,而不仅仅是对列进行编码。 Well, at least, this was the one significant difference that I made when I reconstituted the table. 好吧,至少这是我重组桌子时做出的一个重大改变。

No matter, the parser does not appear in the myPhpAdmin table section where parsers are apparently suppose to appear. 无论如何,解析器都不会出现在显然要出现解析器的myPhpAdmin表部分中。 This is, perhaps, due to the way the parser appears in the table's SHOW CREATE statement. 可能是由于解析器出现在表的SHOW CREATE语句中的方式所致。 In any case, this is a small shortcoming when compared with the parser's overall functionality. 无论如何,与解析器的整体功能相比,这是一个小缺点。

Roddy 罗迪

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM