如何将带有分隔符的文本文件作为字段转换为Solr文档

Question

I have a text file which consist of the following data: 我有一个包含以下数据的文本文件：

andy~1234;M~64365113~2P3VWU3H10~~
mike~4152;M~64365113~2P3VWU3H10~0.6~MG
lesa~4512;F,PM~~N/A~16~MG
riky~7845;M,PM2~~N/A~3.99~MG

I wish to convert it into a solr document in the following manner : 我希望通过以下方式将其转换为Solr文档：

Each row is considered as 1 <doc> document in solr. 每一行在solr中均被视为1个<doc>文档。

'~' is a delimiter which means fields <field> of document. “〜”是定界符，表示文档的字段<field> 。

Do I need to use a DataImportHandler for handling these kind of files? 我需要使用DataImportHandler来处理这类文件吗？ which kind of DataImportHandler is useful. 哪种DataImportHandler有用。 I've gone through LineEntityProcessor , but i didn't understand how I can use it for my problem. 我经历过LineEntityProcessor ，但是我不明白如何使用它解决问题。

Answer 1

Assuming that you know the field names (lines contain just the values), here's an example of how you can do that using a FileDatasource + LineEntityProcessor + ScriptTransformer: 假设您知道字段名称（行仅包含值），这是一个使用FileDatasource + LineEntityProcessor + ScriptTransformer的示例：

<dataConfig>  
    <dataSource encoding="UTF-8" type="FileDataSource" name="file-datasource"/>
    <script><![CDATA[
        function parse(row)    
        {
            var rawLine = row.get("rawLine")

            // Split the rawLine 
            // And for each field

            // row.put('fieldName', fieldValue);                    

            return row;
        }
    ]]></script>        
    <document>
        <entity name="jc"
            processor="LineEntityProcessor"
            url="file:///your.path.file.here"
            dataSource="file-datasource"
            transformer="script:parse">
    </document>
</dataConfig>

如何将带有分隔符的文本文件作为字段转换为Solr文档

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-08-21 05:29:22

如何将带有分隔符的文本文件作为字段转换为Solr文档

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-08-21 05:29:22

解决方案1
2 已采纳 2013-08-21 05:29:22