在Apache Solr中将XML文件索引为纯文本

Question

有什么办法可以将xml文件的所有内容转储到单个content字段中？

schema.xml中

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="content" type="text_general" indexed="true" stored="true" multiValued="false" termVectors="true" termPositions="true" termOffsets="true"/>

用于索引的代码

HttpUrlConnection solrHttpURLConnection = "http://localhost:7892/solr/myCore/update/extract?literal.id=1234&commit=true "
solrHttpURLConnection.setDoOutput(true);
solrHttpURLConnection.setDoInput(true);
solrHttpURLConnection.setUseCaches(false);
solrHttpURLConnection.setAllowUserInteraction(false);
solrHttpURLConnection.setRequestProperty("Content-type", type);
solrHttpURLConnection.connect();

我正在从此url中获取输出流，并通过从dataServer中获取输入流来编写内容。

注意：

上面的代码适用于除xml，csv和json之外的所有文件格式。
没有错误消息来自solr

样本XML文件

<?xml version="1.0" encoding="UTF-8"?>
<content>just a test
</content>

Answer 1

将内容类型设置为“ text / xml”
将以下行添加到您的代码中：OutputStreamWriter writer = new OutputStreamWriter（solrHttpURLConnection.getOutputStream（））; writer.write（your_xml_file）; writer.flush（）;
使用此URL执行请求http：// localhost：7892 / solr / myCore / update？literal.id = 1234＆commit = true对于json文件，请使用/ update / json / docs
另请参阅此文档，以了解有关使用索引处理程序上传数据的信息https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-XMLUpdateCommands

在Apache Solr中将XML文件索引为纯文本

问题描述

1 个解决方案

解决方案1
1 2015-11-18 13:11:19

在Apache Solr中将XML文件索引为纯文本

问题描述

1 个解决方案

解决方案1 1 2015-11-18 13:11:19

解决方案1
1 2015-11-18 13:11:19