[英]Apache Solr Search API default result filters
I'm using Solr with apache nutch to indexing website 我正在使用Solr和Apache Nuch来建立索引网站
My json result looks like this: 我的json结果看起来像这样:
"response": {
"numFound": 0,
"start": 0,
"docs": [
{
"id": "http://mysite.pl/cl-BR/link/link",
"url": "http://mysite.pl/cl-BR/link/link",
"content": [
"content"
],
"_version_": 0000
},
{
"id": "http://mysite.pl/ru-RU/link/link",
"url": "http://mysite.pl/ru-RU/link/link",
"content": [
"content"
],
"_version_": 0000
},
{
"id": "http://mysite.pl/en-EN/link/link",
"url": "http://mysite.pl/en-EN/link/link",
"content": [
"content"
],
"_version_": 0000
},
I would like to add parameter to my query, contains information about language into format for example like this: en-EN
And next return only search result where url contains my parameter. 我想在查询中添加参数,例如将有关语言的信息转换为格式,例如:
en-EN
然后仅返回搜索结果,其中url包含我的参数。
For example: My query is: /solr/CoreName/select?q=you&fl=id,ul,content&urlContains=en-EN
例如:我的查询是:
/solr/CoreName/select?q=you&fl=id,ul,content&urlContains=en-EN
My result is: 我的结果是:
"response": {
"numFound": 0,
"start": 0,
"docs": [
{
"id": "http://mysite.pl/en-EN/link/link",
"url": "http://mysite.pl/en-EN/link/link",
"content": [
"content"
],
"_version_": 0000
},
And when my query is: /solr/CoreName/select?q=you&fl=id,ul,content&urlContains=ru-RU
当我的查询是:
/solr/CoreName/select?q=you&fl=id,ul,content&urlContains=ru-RU
My result is: 我的结果是:
"response": {
"numFound": 0,
"start": 0,
"docs": [
{
"id": "http://mysite.pl/ru-RU/link/link",
"url": "http://mysite.pl/ru-RU/link/link",
"content": [
"content"
],
"_version_": 0000
},
How can i do this? 我怎样才能做到这一点?
The cleanest implementation would be to add a custom field in your schema, and then use copyField
to copy the content from url
to a url_tokenized
field. 最干净的实现是在您的架构中添加一个自定义字段 ,然后使用
copyField
将内容从url
复制到url_tokenized
字段。
<copyField source="url" dest="url_tokenized" />
By using a PatternTokenizer you can tell Solr to split tokens by /
, so that you get ru-RU
as a token in the url_tokenized
field: 通过使用PatternTokenizer,您可以告诉Solr用
/
分割标记,以便在url_tokenized
字段url_tokenized
ru-RU
作为标记:
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="/"/>
</analyzer>
Which should give you something like: 应该给你这样的东西:
<fieldType name="url_tokenized" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="/"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
By adding the LowerCaseFilterFactory we'll make sure that ru-RU and ru-ru both are found regardless of casing used. 通过添加LowerCaseFilterFactory,我们将确保无论使用哪种大小写形式,都可以找到ru-RU和ru-ru。
Querying would then be done by applying a filter query ( fq
) to the query string: 然后通过对查询字符串应用过滤查询(
fq
)来完成查询:
...&fq=url_tokenized:ru-ru
This will limit the response to documents that contains "/ru-ru/" somewhere in its URL. 这会将响应限制为在URL中包含“ / ru-ru /”的文档。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.