简体   繁体   English

在Solr管理员中上传CSV文件时出现问题

[英]Problem while uploading the CSV file in Solr admin

I have a application which has a search facility and i am using Solr for searching. 我有一个具有搜索工具的应用程序,并且我正在使用Solr进行搜索。 I am trying to upload the data as CSV file. 我正在尝试将数据上传为CSV文件。 But the data is not uploading to the Solr core properly. 但是数据没有正确上传到Solr核心。

Here is the curl command i am using 这是我正在使用的curl命令

curl 'http://localhost:8983/solr/test_import/update/csv?commit=true&separator=%09&escape=%5c&encapsulator=%22' --data-binary @/tmp/college_data_20180809164959.csv -H 'Content-type:application/csv'

this gives me an error 这给我一个错误

java.io.IOException: (line 0) invalid char between encapsulated token end delimiter\\n\\tat org.apache.solr.internal.csv.CSVParser.encapsulatedTokenLexer

If remove encapsulator=%22 it uploads but not in good format. 如果删除封装器=%22,它将上传但格式不正确。

This his how it got uploaded: 这是他的上传方式:

{
        "id":"8adb5378-aa58-427d-8ff4-fca4f31c96e6",
        "ID_College_Name_State_City_Address":["43387,,,,"],
        "_version_":1608318488833687552,
        "ID_College_Name_State_City_Address_str":["43387,,,,"]},
      {
        "id":"e29a0435-95c5-4d3c-bddf-eacef22f6859",
        "ID_College_Name_State_City_Address":["43388,apsce,,,"],
        "_version_":1608318488835784704,
        "ID_College_Name_State_City_Address_str":["43388,apsce,,,"]}

This is my csv file structure 这是我的csv文件结构

"ID","College_Name","State","City","Address"
"43387","","","",""
"43388","apsce","","",""

Please help me in resolving this issue. 请帮助我解决此问题。 Please let me know if you need any further information about this problem. 如果您需要有关此问题的更多信息,请告诉我。

Your CSV file should be parsed perfectly fined with the default values for CSV parsing. 您的CSV文件应使用CSV解析的默认值完美解析。 Drop all the parameters you're giving. 删除您提供的所有参数。

The error message is because you've given the separator parameter as %09 , which is the TAB character. 该错误消息是因为您已将separator参数指定为%09 ,这是TAB字符。 Your values are not separated by a TAB character, but by the standard , . 您的值不是由TAB字符分隔,而是由标准,分隔。

separator=%09 # separated by TAB (wrong)
escape=%5c # escaped by \ (default)
encapsulator=%22 # encapsulated by " (default)

Since the parser is looking for values separated by <TAB> , having multiple " between separators indicates a parse error (which happens since , isn't given as a separator). 由于解析器正在寻找由<TAB>分隔的值,因此在分隔符之间具有多个"表示解析错误(由于,没有作为分隔符给出,因此发生了错误)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM