[英]How to convert text file with delimiters as fields into solr document
I have a text file which consist of the following data: 我有一个包含以下数据的文本文件:
andy~1234;M~64365113~2P3VWU3H10~~
mike~4152;M~64365113~2P3VWU3H10~0.6~MG
lesa~4512;F,PM~~N/A~16~MG
riky~7845;M,PM2~~N/A~3.99~MG
I wish to convert it into a solr document in the following manner : 我希望通过以下方式将其转换为Solr文档:
- Each row is considered as 1
<doc>
document in solr.每一行在solr中均被视为1个
<doc>
文档。- '~' is a delimiter which means fields
<field>
of document.“〜”是定界符,表示文档的字段
<field>
。
Do I need to use a DataImportHandler for handling these kind of files? 我需要使用DataImportHandler来处理这类文件吗? which kind of DataImportHandler is useful.
哪种DataImportHandler有用。 I've gone through LineEntityProcessor , but i didn't understand how I can use it for my problem.
我经历过LineEntityProcessor ,但是我不明白如何使用它解决问题。
Assuming that you know the field names (lines contain just the values), here's an example of how you can do that using a FileDatasource + LineEntityProcessor + ScriptTransformer: 假设您知道字段名称(行仅包含值),这是一个使用FileDatasource + LineEntityProcessor + ScriptTransformer的示例:
<dataConfig>
<dataSource encoding="UTF-8" type="FileDataSource" name="file-datasource"/>
<script><![CDATA[
function parse(row)
{
var rawLine = row.get("rawLine")
// Split the rawLine
// And for each field
// row.put('fieldName', fieldValue);
return row;
}
]]></script>
<document>
<entity name="jc"
processor="LineEntityProcessor"
url="file:///your.path.file.here"
dataSource="file-datasource"
transformer="script:parse">
</document>
</dataConfig>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.