简体   繁体   English

Apache Solr无法索引JSON文件

[英]Apache solr unable to index the JSON file

I am new to apache Solr , Did some research and learned how to do indexing . 我是Apache Solr的新手,做了一些研究并学习了如何进行索引编制。 Currently i am facing issue with JSON file indexing. 目前,我正面临JSON文件索引编制的问题。

i am unable to do indexing the below mentioned JSON file data format. 我无法索引以下提到的JSON文件数据格式。 After removing the "documents" array i am able to do . 删除“文档”数组后,我可以执行此操作。 Not sure why its happening. 不知道为什么会这样。

I haven't added any configuration schema.xml file . 我尚未添加任何配置schema.xml文件。 Tried with some samples which are in apache solr those i am able to do indexing. 尝试了一些我能够做索引的样本。

And also what is the use of id in schema.xml file ? 还有schema.xml文件中id的用途是什么? if my json contains the employid can i add employid instead of "id" 如果我的json包含员工编号,我可以添加员工编号而不是“ id”

[{
  "employid": "E64492",
  "employGroup": "ABC ABC GROUP",
  "ssn": "BE0003565737",
  "country": {
      "countryId": "56",
      "countryName": "india"
  },
  "sector": {
      "sId": "40",
      "sName": "name"
  },
  "documents": [{
      "language": "EN",
      "fileName": "Helloworld.pdf",
      "fileExists": true,
      "employid": "E64492"
  }],
}]

Can some one please help. 有人可以帮忙吗?

Excpetion details: 执行详情:

"org.apache.solr.common.SolrException"],"msg":"Error parsing JSON field value. Unexpected OBJECT_START at [227], field=documents","code":400}}

The issue is explained in the Solr Reference guide section on indexing with JSON , but it is a bit hard to see among all the text. Solr参考指南》中有关使用JSON编制索引的部分对此问题进行了说明,但是在所有文本中都很难看到。

There are basically two ways to deal with JSON: 基本上有两种处理JSON的方法:

  1. Solr input format where you specify the fields and recursive structures directly using Solr convention. Solr输入格式,您可以直接使用Solr约定指定字段和递归结构。 In this format, you can feed multiple JSON objects to the parser as you are explicit about each object's structure. 通过这种格式,您可以明确地了解每个对象的结构,从而将多个JSON对象提供给解析器。
  2. Generic JSON format that gets mapped to Solr document following the rules you specify (or that are specified by default in the solrconfig.xml for your collection) 遵循您指定的规则(或默认在集合的solrconfig.xml中指定的规则)映射到Solr文档的通用JSON格式

The array syntax you used is for the first option - Solr input format. 您使用的数组语法是第一个选项-Solr输入格式。 However, that format does not support nested documents in the way the rest of your object is structured, it needs a _childDocuments_ array instead. 但是,该格式不支持对象其余部分的构造方式的嵌套文档,它需要_childDocuments_数组。

And the generic JSON parser can only take one object. 通用JSON解析器只能接受一个对象。

So, you are at the cross-roads and need to decide what you want to do. 因此,您处在十字路口,需要决定要做什么。 This, most likely, imply thinking about the schema you want to end-up with and whether you want to define it explicitly or via the mapping rules. 这很可能意味着要考虑要最终使用的架构,以及是否要显式定义或通过映射规则进行定义。

You have to define a schema for corresponding to the document that you are trying to insert. 您必须定义与要插入的文档相对应的架构。
Also, you have an extra , after documents 此外,您有一个额外的,后文件

"documents": [{
  "language": "EN",
  "fileName": "Helloworld.pdf",
  "fileExists": true,
  "employid": "E64492"
}],

Regarding the id field, you can rename it to employe_id but also remember to change the tag <uniqueKey>id</uniqueKey> to employe_id 关于id字段,您可以将其重命名为employe_id但还要记住将标记<uniqueKey>id</uniqueKey>更改为

You can also have a schema without unique_key . 您也可以使用不带unique_key的架构。 Check this for more information on unique keys. 选中此项以获取有关唯一键的更多信息。

Maddy what you are trying to index is a Nested JSON object !!! Maddy您要索引的是一个嵌套的JSON对象! Solr only allows JSON data to be indexed in FLAT format . Solr仅允许以FLAT格式索引JSON数据。 By that I mean, Country and Sector object cannot be indexed in the way that you are trying to . 我的意思是,Country and Sector对象无法以您尝试的方式建立索引。 You have to flatten them as separate fields, ie Country.countryId must be one separate field, Country.countryName must be a separate field . 您必须将它们展平为单独的字段,即Country.countryId必须是一个单独的字段,Country.countryName必须是一个单独的字段。 Similarly , Sector.sId must be a separate field , and Sector.sectorName must be a separate field . 同样,Sector.sId必须是一个单独的字段,Sector.sectorName必须是一个单独的字段。 Also the objects inside the last document JSON object should be declared in same fashion as employee Id is declared, you need to remove document object and put every field freely . 同样,最后一个文档JSON对象中的对象应以声明员工ID的相同方式声明,您需要删除文档对象并自由放置每个字段。 I hope you get the point . 我希望你明白这一点。 This will 100℅ work . 这将100℅工作。 I repeat , you cannot index a Nested JSON like this, you need to flatten the JSON to the simplest . 我再说一遍,您不能像这样对嵌套的JSON编制索引,您需要将JSON展平为最简单的。 Let me know if that helps :) . 让我知道是否有帮助:)。 To more understand the point beneath, on Solr admin screen , take this JSON and try to index it in the Documents section, while keep the network tab open in Chrome or some other browser by clicking F12, you will see the same error that you are getting in the Console !! 为了进一步了解其要点,请在Solr管理员屏幕上,使用此JSON并尝试在“文档”部分中对其进行索引,同时通过单击F12保持网络选项卡在Chrome或其他浏览器中处于打开状态,您将看到与您相同的错误进入控制台! That is the reason that while you can keep Country and Sector objects as same, but you need to remove data Objects and declare fields inside it freely . 这就是为什么虽然可以将Country和Sector对象保持相同,但是需要删除数据对象并在其中自由声明字段的原因。

Finally I am able to do indexing after adding the schema definition specified below 最后,在添加下面指定的架构定义后,我能够进行索引

     <field name="buyLimit" type="tdoubles"/>
      <field name="country.countryId" type="tlongs"/>
      <field name="country.countryName" type="strings"/>
      <field name="creationDate" type="tlongs"/>
      <field name="currency" type="string" indexed="true" stored="true"/>
      ***<field name="documents.fileExists" type="booleans"/>
      <field name="documents.fileName" type="strings"/>
      <field name="documents.language" type="strings"/>
      <field name="documents.researchId" type="strings"/>***
      <field name="opinion.opinion" type="strings"/>
      <field name="opinion.opinionId" type="strings"/>
       <field name="employeId" type="string" multiValued="false" indexed="true" stored="true"/>
      <field name="s.sId" type="tlongs"/>
      <field name="s.sName" type="strings"/>
      <field name="type" type="string" indexed="true" stored="true"/>

Thanks all for your comments which helped me to understand Solr more. 感谢您的意见,这有助于我进一步了解Solr。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM