简体   繁体   English

我们如何创建自定义solr索引和自定义lucen索引来搜索alfresco中的文档?

[英]How we can create custom solr indexing and custom lucen indexing to search documents in alfresco?

Actually i have attended some interviews on alfresco, But i commonly get this question that How to create custom indexing and how to re-index the documents. 实际上我参加了一些关于露天的采访,但我常常得到如何创建自定义索引以及如何重新索引文档的问题。

I have googled it, but didn't find any helpfull answer, or still i am not able to understand what exactly is it and how to do? 我用谷歌搜索了它,但没有找到任何有用的答案,或者我仍然无法理解究竟是什么,怎么做?

Can any one please help me to understand this with required configuration that we have to do for custom indexing? 任何人都可以通过我们必须为自定义索引编制的必要配置来帮助我理解这一点吗?

Thanks in Advance 提前致谢

1/ Let's admit you created your own model and want to chose how to customize the indexing of your fields 1 /让我们承认您创建了自己的模型,并希望选择如何自定义字段的索引

My explaination will be based on this page : http://docs.alfresco.com/5.0/concepts/search-fts-config.html and mor particullary this part : 我的解释将基于这个页面: http//docs.alfresco.com/5.0/concepts/search-fts-config.html和mor这个部分:

Data dictionary options 数据字典选项

The indexing behavior of each property can be set in the content model. 可以在内容模型中设置每个属性的索引行为。 By default, they are indexed atomically. 默认情况下,它们以原子方式编制索引。 The property value is not stored in the index, and the property is tokenized when it is indexed. 属性值不存储在索引中,并且在索引时对属性进行标记化。 The following example shows how indexing can be controlled. 以下示例显示了如何控制索引。

Enabled="false" If this is false, there will be no entry for this property in the index. Enabled =“false”如果为false,则索引中不会有此属性的条目。

Atomic="true" If this is true, the property is indexed in the transaction, if not the property is indexed in the background. Atomic =“true”如果这是真的,则在事务中索引属性,否则属性在后台索引。

facetable="true" If true, the property will be used for faceting and if false, you cannot use it for faceting. facetable =“true”如果为true,则该属性将用于构面,如果为false,则不能将其用于构面。

Tokenised="true" If "true", the string value of the property is tokenized before indexing. Tokenised =“true”如果为“true”,则在索引之前对属性的字符串值进行标记化。 if "false", it is indexed "as is" as a single string. 如果为“false”,则将其“原样”索引为单个字符串。 if "both" then both specified forms are in the index. 如果“both”则两个指定的表单都在索引中。

Basically, if enabled is true, it means that the field will be searchable. 基本上,如果enabled为true,则表示该字段可以搜索。

If tokenized is true, it means (in a nutshell) that the field you are indexing will be returned as a result if you look at only a part of it : 如果tokenized为true,则表示(简而言之)如果仅查看其中的一部分,则将返回您正在编制索引的字段:

The field with the value "Blue cat" will be return if 如果是,将返回值为“Blue cat”的字段

  • it is tokenized and the words "cat" or "blue" are queried 它被标记化并且查询单词“cat”或“blue”
  • it is not tokenized and the exact sentence "Blue cat" will be asked. 它没有被标记化,并且会询问确切的句子“蓝猫”。 Generally, document content are tokenized, this is why you cant find a document by his content, only with a few words. 通常,文档内容是标记化的,这就是为什么您无法通过他的内容找到文档,只需要几句话。

2/ Let's admit you want to change your datatype analyser : 2 /让我们承认您要更改数据类型分析器:

For each data type, an analyser is chosen to process the corresponding field. 对于每种数据类型,选择分析器来处理相应的字段。 You can have a look at the configuration files here : https://github.com/Alfresco/community-edition/tree/master/projects/system-build-test/config/alfresco/model 您可以在这里查看配置文件: https//github.com/Alfresco/community-edition/tree/master/projects/system-build-test/config/alfresco/model

In the default configuration file (dataTypeAnalyzers.properties), you can see (for example) that the text field is processed by the AlfrescoStandardAnalyser. 在默认配置文件(dataTypeAnalyzers.properties)中,您可以看到(例如)AlfrescoStandardAnalyser处理文本字段。 Now, since I configured my Alfresco with a French locale, my alfresco will override this behaviour with the dataTypeAnalyzers_fr.properties file, so the text field will be processed by the FrenchAnalyzer. 现在,由于我使用法语区域设置配置了我的Alfresco,我的露天将使用dataTypeAnalyzers_fr.properties文件覆盖此行为,因此文本字段将由FrenchAnalyzer处理。 This analyser is better for me since it handles some french particularities. 这款分析仪对我来说更好,因为它可以处理一些法国特色。 You can override this analyser with a snowball one if needed (which have a different behaviour). 如果需要,您可以使用雪球覆盖此分析器(具有不同的行为)。

3/ Let's admit you have a technical need, you want to customize your Solr Configuration 3 /让我们承认您有技术需求,您想要自定义您的Solr配置

My explaination will still be based on this page : http://docs.alfresco.com/5.0/concepts/search-fts-config.html 我的解释仍将基于此页面: http//docs.alfresco.com/5.0/concepts/search-fts-config.html

Solr 4 index properties Solr 4索引属性

solr.host=localhost The host name where the Solr instance is located. solr.host = localhost Solr实例所在的主机名。

solr.port=8080 The port number on which the Solr instance is running. solr.port = 8080正在运行Solr实例的端口号。

solr.port.ssl=8443 The port number on which the Solr SSL support is running. solr.port.ssl = 8443正在运行Solr SSL支持的端口号。

solr.solrUser=solr The Solr user name. solr.solrUser = solr Solr用户名。

solr.solrPassword=solr The Solr password. solr.solrPassword = solr Solr密码。

solr.secureComms=https The HTTPS connection. solr.secureComms = https HTTPS连接。

solr.solrConnectTimeout=5000 The Solr connection timeouts in ms. solr.solrConnectTimeout = 5000 Solr连接超时,以毫秒为单位。

solr.solrPingCronExpression=0 0/5 * * * ? solr.solrPingCronExpression = 0 0/5 * * *? * The cron expression defining how often the Solr Admin client (used by JMX) pings Solr 4 if it goes away. * cron表达式定义了Solr Admin客户端(由JMX使用)在Solr 4消失时的频率。

You can change some Solr parameters if you need it. 如果需要,可以更改一些Solr参数。 I won't go any further since I feel this is not what you are looking for. 我不会再进一步​​,因为我觉得这不是你想要的。

4/ For the reindex part , I won't be really specific since Gagravarr already gave you the documentation link : http://docs.alfresco.com/5.1/tasks/solr-reindex.html Just keep in mind that index are recontrutable, so you can start a reindexing by just deleting the indexing folder. 4 /对于reindex部分 ,我不会真正具体,因为Gagravarr已经为您提供了文档链接: http ://docs.alfresco.com/5.1/tasks/solr-reindex.html请记住,索引是可重新调整的,因此您只需删除索引文件夹即可开始重建索引。

I will just finish by saying that I've covered only a small part of the indexing subject. 我最后会说,我只涵盖了索引主题的一小部分。 Since it is a huge field, we would really need you to specify your need in order to give you the right answer. 由于这是一个巨大的领域,我们真的需要您指定您的需求,以便给您正确的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM