SOLR managed-schema, how to use it?

Question

I got my SOLR to work, and it works decently but i have no clue what exactly managed-schema is, since I did use the default version in which i added few lines that i needed for my case

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="name" type="text_general" indexed="true" stored="true" default="" />
<field name="brand_id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="brand_name" type="text_general" indexed="true" stored="true" default="" />
<field name="type" type="string" indexed="true" stored="true" required="true" default="0"  />

I cannot include the full file because is like 700 lines... but full xml is here http://pastebin.com/Z9nc36QD

do i have to keep everything as the default example? i have no clue... do you have an example of a typical schema file?

Answer 1

The Managed Schema is supposed to be manipulated through the Schema API and not by editing the files present (which include a warning about doing so). The schema.xml file is only read once at the first time of startup to create the initial schema, any changes after that has to be done through the Schema API.

If you want to use a schema.xml file like the older Solr versions does without any Schema API support, you can use the ClassicIndexSchemaFactory in your solrconfig.xml file. See the Schema Factory Definition :

<schemaFactory class="ClassicIndexSchemaFactory"/>

An alternative to using a managed schema is to explicitly configure a ClassicIndexSchemaFactory. ClassicIndexSchemaFactory requires the use of a schema.xml configuration file, and disallows any programatic changes to the Schema at run time. The schema.xml file must be edited manually and is only loaded only when the collection is loaded.

You only need to keep the parts of the schema that you actually use, and the example schema (depending on which schema a user starts out with) will usually have many, many fields and fieldtypes that you don't need. These can be removed until they are needed, and the field types can be tweaked to enable the features that you want.

Do however remember that a change to the schema will require the content to be reindexed, so that the changes will be visible when searching.

Exact schema design is something you'll have to work with and experiment with, so that you're able to get the query profile and features for matching that you need.

Answer 2

You are supposed to use Solr's Schema API. More information can be found here: https://lucene.apache.org/solr/guide/7_2/schema-api.html

It basically means you issue curl -X POST (to localhost) from a shell to edit the file.

Example:

:curl -X POST -H 'Content-type:application/json' --data-binary '{
 "add-field-type" : {
 "name":"myNewTxtField",
 "class":"solr.TextField",
 "positionIncrementGap":"100",
 "analyzer" : {
    "charFilters":[{
       "class":"solr.PatternReplaceCharFilterFactory",
       "replacement":"$1$1",
       "pattern":"([a-zA-Z])\\\\1+" }],
    "tokenizer":{
       "class":"solr.WhitespaceTokenizerFactory" },
    "filters":[{
       "class":"solr.WordDelimiterFilterFactory",
       "preserveOriginal":"0" }]}}
}' http://localhost:8983/solr/gettingstarted/schema`

Personal commentary

It's 2018, there really should just be a web interface from their existing admin console to build and issue these localhost commands. I get that things can become tricky if there's a zookeeper, but basic exploration on a single server should be trivial and currently it is not . This approach would show the formatted curl command so it would train new developers on proper usage.

Developers have to translate the xml from documentation like this into correct json for the POST.

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> 
  <analyzer type="index"> 
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
        words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory"
      synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" 
      ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" 
      synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

SOLR managed-schema, how to use it?

Question

2 answers

solution1
4 2016-10-12 02:47:06

solution2
2 2018-03-05 22:50:53

SOLR managed-schema, how to use it?

Question

2 answers

solution1 4 2016-10-12 02:47:06

solution2 2 2018-03-05 22:50:53

solution1
4 2016-10-12 02:47:06

solution2
2 2018-03-05 22:50:53