简体   繁体   中英

How we can create custom solr indexing and custom lucen indexing to search documents in alfresco?

Actually i have attended some interviews on alfresco, But i commonly get this question that How to create custom indexing and how to re-index the documents.

I have googled it, but didn't find any helpfull answer, or still i am not able to understand what exactly is it and how to do?

Can any one please help me to understand this with required configuration that we have to do for custom indexing?

Thanks in Advance

1/ Let's admit you created your own model and want to chose how to customize the indexing of your fields

My explaination will be based on this page : http://docs.alfresco.com/5.0/concepts/search-fts-config.html and mor particullary this part :

Data dictionary options

The indexing behavior of each property can be set in the content model. By default, they are indexed atomically. The property value is not stored in the index, and the property is tokenized when it is indexed. The following example shows how indexing can be controlled.

Enabled="false" If this is false, there will be no entry for this property in the index.

Atomic="true" If this is true, the property is indexed in the transaction, if not the property is indexed in the background.

facetable="true" If true, the property will be used for faceting and if false, you cannot use it for faceting.

Tokenised="true" If "true", the string value of the property is tokenized before indexing. if "false", it is indexed "as is" as a single string. if "both" then both specified forms are in the index.

Basically, if enabled is true, it means that the field will be searchable.

If tokenized is true, it means (in a nutshell) that the field you are indexing will be returned as a result if you look at only a part of it :

The field with the value "Blue cat" will be return if

  • it is tokenized and the words "cat" or "blue" are queried
  • it is not tokenized and the exact sentence "Blue cat" will be asked. Generally, document content are tokenized, this is why you cant find a document by his content, only with a few words.

2/ Let's admit you want to change your datatype analyser :

For each data type, an analyser is chosen to process the corresponding field. You can have a look at the configuration files here : https://github.com/Alfresco/community-edition/tree/master/projects/system-build-test/config/alfresco/model

In the default configuration file (dataTypeAnalyzers.properties), you can see (for example) that the text field is processed by the AlfrescoStandardAnalyser. Now, since I configured my Alfresco with a French locale, my alfresco will override this behaviour with the dataTypeAnalyzers_fr.properties file, so the text field will be processed by the FrenchAnalyzer. This analyser is better for me since it handles some french particularities. You can override this analyser with a snowball one if needed (which have a different behaviour).

3/ Let's admit you have a technical need, you want to customize your Solr Configuration

My explaination will still be based on this page : http://docs.alfresco.com/5.0/concepts/search-fts-config.html

Solr 4 index properties

solr.host=localhost The host name where the Solr instance is located.

solr.port=8080 The port number on which the Solr instance is running.

solr.port.ssl=8443 The port number on which the Solr SSL support is running.

solr.solrUser=solr The Solr user name.

solr.solrPassword=solr The Solr password.

solr.secureComms=https The HTTPS connection.

solr.solrConnectTimeout=5000 The Solr connection timeouts in ms.

solr.solrPingCronExpression=0 0/5 * * * ? * The cron expression defining how often the Solr Admin client (used by JMX) pings Solr 4 if it goes away.

You can change some Solr parameters if you need it. I won't go any further since I feel this is not what you are looking for.

4/ For the reindex part , I won't be really specific since Gagravarr already gave you the documentation link : http://docs.alfresco.com/5.1/tasks/solr-reindex.html Just keep in mind that index are recontrutable, so you can start a reindexing by just deleting the indexing folder.

I will just finish by saying that I've covered only a small part of the indexing subject. Since it is a huge field, we would really need you to specify your need in order to give you the right answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM