简体   繁体   中英

How to define a document level and data level field for a solr schema

I have a simple file called test.csv and it has the following data

id,author,title
1,sanjay,ABC
2,vijay,XYZ

I wish to index this file in solr and pass a unique id to it called id=1 be able to query this document in the future(meaning all the values ie equivalent to select * from table-name) and similarly want to index many such files with document id 's like id=2, id=3 etc.

In my schema.xml, id is a field

 <field name="id" type="string" indexed="true" stored="true" />

and

 <!-- Field to use to determine and enforce document uniqueness.
  Unless this field is marked with required="false", it will be a required field
 -->
 <uniqueKey>id</uniqueKey>

And instances where id doesn't exist in the file and yet i want to pass id as a parameter for document level uniqueness, it screams out the following error

 [root@****ltest1 garyTestDocs]# curl  http://localhost:8983/solr/update/csv?id='SL1' --data-binary @sample.csv -H    'Content-type:text/plain; charset=utf-8'
 <html>
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
 <title>Error 400 [doc=null] missing required field: ref</title> 
 </head>
 <body><h2>HTTP ERROR 400</h2> 
 <p>Problem accessing /solr/update/csv. Reason:
 <pre>    [doc=null] missing required field: id</pre></p><hr /><i><small>Powered by  Jetty://</small></i><br/>                                                
 <br/>                                                
 <br/>                                                
 <br/>                                                
 <br/>                                                
 <br/>                                                
 <br/>                                                

 </body>
 </html>

So in essence there are two scenarios, index the above sample file with id column inside the file and another scenario is have the id column. But in both the scenarios, i need to pass a document level unique id ie id='1' or id='2'.

Could you pls explain your answer with these two scenarios and with the curl syntax and schema.xml (just the needed fields)

In Solr, imagine the schema.xml as a DB table. TO maintain uniqueness of the rows we have a primary key column in it. Usually like the id column which has unique values in it. When you index docs in solr for eg a csv file in my case which has columns in it. the id column is needed to be unique and cannot have empty rows. there many ways to create unique strings but just for sake of eg i have used the format file_name_1 ... (have a fill series like 1,2,3...) . This is the only way to specify uniqueness of records in solr. you cant have document level uniqueness meaning can't provide a unique key at the time of indexing. So in schema.xml, you have a unique key tag which is nothing but the column in your document which is going to be unique.

qry for indexing a csv file is as follows : -

curl http://:8983/solr/update/csv --data-binary @Sample.csv -H 'Content-type:text/plain; charset=utf-8'

schema.xml will have a id col

 <field name="id" type="string" indexed="true" stored="true" />

some of the columns in my docs

 <field name="author" type="text" indexed="true" stored="true" />
 <field name="title" type="text" indexed="true" stored="true" />


 <uniqueKey>id</uniqueKey>

I didnt use doc level unique id while the time of indexing. So i hope i have answered my own question !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM