Apache Solr 标记器

Question

I am using Apache Solr as my semantic search engine.我使用 Apache Solr 作为我的语义搜索引擎。 In which users can type anything and I have to retrieve using relevant results using words.用户可以在其中输入任何内容，而我必须使用单词使用相关结果进行检索。

I want to split string in tokens.我想在令牌中拆分字符串。

Example: "actorsfrommumbai" -> "actors from mumbai"

How can I achieve this feature in solr ?如何在 solr 中实现此功能？

Answer 1

看起来你正在寻找解压缩 - > https://wiki.apache.org/solr/LanguageAnalysis#Decompounding这使你有可能搜索复合词的一部分。

Answer 2

There is a possibility in solr to configure analyser for decompounding based on dictionary provided. 在solr中有可能根据提供的字典配置分析器进行分解。 You will have to configure analyser something like this 你必须配置像这样的分析器

 <analyzer>
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.DictionaryCompoundWordTokenFilterFactory"
 dictionary="abc.txt"/>
 </analyzer>

abc.txt is the dictionary. abc.txt是字典。

Note that the analyser apply both at index as well as query time. 请注意，分析器同时应用索引和查询时间。

Answer 3

You can try using Ngram and EdgeNgram filter and tokenizers available in solr.您可以尝试使用solr 中提供的Ngram 和 EdgeNgram 过滤器和分词器。 Because it is a single word and it can only be split with these two since you can not use delimiter here.因为它是一个单词，并且只能用这两个单词拆分，因为这里不能使用分隔符。

Apache Solr 标记器

问题描述

3 个解决方案

解决方案1
0 2016-08-08 12:13:49

解决方案2
0 2016-09-20 05:50:41

解决方案3
0 2021-12-08 16:32:15

Apache Solr 标记器

问题描述

3 个解决方案

解决方案1 0 2016-08-08 12:13:49

解决方案2 0 2016-09-20 05:50:41

解决方案3 0 2021-12-08 16:32:15

解决方案1
0 2016-08-08 12:13:49

解决方案2
0 2016-09-20 05:50:41

解决方案3
0 2021-12-08 16:32:15