简体   繁体   中英

Index XML files in Apache Solr as plain text

Is there any way to dump all contents of xml file in a single content field??

schema.xml

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="content" type="text_general" indexed="true" stored="true" multiValued="false" termVectors="true" termPositions="true" termOffsets="true"/>

code used for indexing

HttpUrlConnection solrHttpURLConnection = "http://localhost:7892/solr/myCore/update/extract?literal.id=1234&commit=true "
solrHttpURLConnection.setDoOutput(true);
solrHttpURLConnection.setDoInput(true);
solrHttpURLConnection.setUseCaches(false);
solrHttpURLConnection.setAllowUserInteraction(false);
solrHttpURLConnection.setRequestProperty("Content-type", type);
solrHttpURLConnection.connect(); 

i am taking outputstream from this url and writing contents by taking input stream from dataServer.

NOTE:

  1. the above code works for all file formats except xml,csv and json.
  2. no error message is coming from solr

Sample XML File

<?xml version="1.0" encoding="UTF-8"?>
<content>just a test
</content>
  1. Set the content type to "text/xml"
  2. Add the following lines to your code: OutputStreamWriter writer = new OutputStreamWriter(solrHttpURLConnection.getOutputStream()); writer.write(your_xml_file); writer.flush();

  3. Execute the request with this url http://localhost:7892/solr/myCore/update?literal.id=1234&commit=true For json files use /update/json/docs

  4. Please also check this documentation about uploading data with index handlers https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-XMLUpdateCommands

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM