简体   繁体   English

使用Lucene进行索引时如何将JSON对象视为单独的文档

[英]How to treat JSON objects as separate documents while indexing using Lucene

I have a few JSON files, that look like the one below. 我有一些JSON文件,看起来像下面的文件。 I want to treat each JSON object in each file as one document (with "user_id" as a unique identifier). 我想将每个文件中的每个JSON对象都视为一个文档(以“ user_id”作为唯一标识符)。 My code treats the entire JSON file as one document. 我的代码将整个JSON文件视为一个文档。 How can I fix this? 我怎样才能解决这个问题?

[
{
"user_id": "john_doeee",
"lon": 204.0,
"lat": 101.0,
"stored" : true,
"hashtag" : "ucriverside"
},
{
"user_id": "carlos_baby",
"lon": 204.0,
"lat": 101.0,
"stored" : true,
"hashtag" : "UCR"
},
{
"user_id": "emmanuel_",
"lon": 204.0,
"lat": 101.0,
"stored" : false,
"hashtag": "riverside"
}
]

I think it has something to do with the Document method? 我认为这与Document方法有关吗? Here's what I have: 这是我所拥有的:

static void indexDoc(IndexWriter writer, Path file, long lastModified) throws IOException
{
try (InputStream stream = Files.newInputStream(file))
{
     //Create lucene Document
     Document doc = new Document();

     doc.add(new StringField("path", file.toString(), Field.Store.YES));
     doc.add(new LongPoint("modified", lastModified));
     doc.add(new TextField("contents", new String(Files.readAllBytes(file)), Store.YES));

     writer.updateDocument(new Term("path", file.toString()), doc);
}
}

No, it's nothing to do with Document method. 不,与Document方法无关。 Lucene have no default ways of understanding that this is JSON file and it should be split up in several Lucene documents. Lucene没有默认的方式来理解这是JSON文件,应该将其拆分为多个Lucene文档。 You would need to do it yourself, by using some Java JSON library. 您将需要使用一些Java JSON库自己进行操作。

One of many of possibilities could be to use https://github.com/stleary/JSON-java library with code like this: 许多可能性之一可能是将https://github.com/stleary/JSON-java库与以下代码一起使用:

JSONArray arr = new JSONArray(" .... ");
for (int i = 0; i < arr.length(); i++) {
    String text = arr.getJSONObject(i);
    doc.add(new TextField("contents", text), Store.YES));
}

Of course you're free to use any other JSON libraries like Jackson, GSON, etc. 当然,您可以自由使用任何其他JSON库,例如Jackson,GSON等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM