简体   繁体   English

如何使用Lucene索引和搜索结构文本文件?

[英]How do I use Lucene to index and search structure text files?

This is the first time I use Lucene, and I have a text file like this : 这是我第一次使用Lucene,并且我有一个像这样的文本文件:

id,name,address,hobby
1,namm1,address1,football
2,namm2,address2,football
3,namm3,address3,football
4,namm4,address4,football
5,namm5,address5,football
6,namm6,address6,basketball
7,namm7,address7,basketball
8,namm8,address1,football
9,namm9,address8,swimming
...

The file above is a text file which contains 1,000,000 lines. 上面的文件是一个包含1,000,000行的文本文件。 Now I want to find the record whose address is address1 and its hobby is football from the file, and then put the record into another file like this: 现在,我想从文件中找到地址为address1且其爱好是football的记录,然后将记录放入另一个文件中,如下所示:

1,namm1,address1,football
8,namm8,address1,football
...

The first file is extremely large, so it would be very slow to find the record one after another. 第一个文件非常大,因此一个接一个地查找记录会非常慢。 I want to and build an index (according to address and hobby) for the first file with Lucene. 我想为Lucene的第一个文件建立索引(根据地址和爱好)。 Then I can quickly find the record whose address is address1 and its hobby is football , and put it in a new file. 然后,我可以快速找到地址为address1且兴趣爱好为football的记录,并将其放入新文件中。 I have never programmed with Lucene. 我从未用Lucene编程。 Who can give me a similar example? 谁能给我一个类似的例子?

This is pretty simple. 这很简单。 When you index a file with Lucene, you can define your own "Analyzer". 使用Lucene索引文件时,可以定义自己的“分析器”。 In a nutshell, the analyzer extracts information from a source and puts it into "fields" of a lucene "document". 简而言之,分析器从源中提取信息并将其放入Lucene“文档”的“字段”中。

When you search something, you can define which fields Lucene should consider. 搜索内容时,可以定义Lucene应该考虑的字段。

So the solution in your case is to write an analyzer which puts each column into a field. 因此,根据您的情况,解决方案是编写一个分析器,将每一列放入一个字段中。 Use a MultiFieldQueryParser and in your query, specify the field names. 使用MultiFieldQueryParser并在查询中指定字段名称。 For your example, the query would be 对于您的示例,查询将是

address:address1 hobby:football

我认为他根本不需要编写分析器,他可以使用许多内置分析器之一,使用Java代码来解析每一行并将每个值放入相应的字段中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM