简体   繁体   English

Lucene 6.2.1如何在不知道其名称的情况下获取所有字段名称或在所有字段中进行搜索

[英]Lucene 6.2.1 How to get all field names or search across all fields without knowing their names

I'm new in Lucene and I would like to know if there is a way to search through all possible fields in multiple documents without knowing their names or... another approach: to get all field names (version 6.2.1) 我是Lucene的新手,我想知道是否可以在不知道其名称的情况下搜索多个文档中所有可能的字段,或者...另一种方法:获取所有字段名称(6.2.1版)

  1. For instance: How to get all names from ' fields ' array and not to fill them like in example below 例如:如何从“ fields ”数组中获取所有名称,而不像下面的示例那样填充它们

     Analyzer analyzer = new StandardAnalyzer(); String querystr = "test"; String[] fields = {"title","isbn","desc", "name", "surname", "description"}; BooleanClause.Occur[] flags = new BooleanClause.Occur[fields.length]; Arrays.fill(flags, BooleanClause.Occur.SHOULD); Query query = MultiFieldQueryParser.parse(querystr, fields, flags, analyzer); 

    I have already checked those topics: 我已经检查了这些主题:

    a) How to search across all the fields? a) 如何搜索所有字段?

    We have implemented this answer: 我们已经实现了以下答案:

    1) Index-time approach: Use a catch-all field. 1)索引时间方法:使用包罗万象的字段。 This is nothing but appending all the text from all the fields (total text from your input doc) and place that resulting huge text in a single field. 除了将所有字段中的所有文本(输入文档中的全部文本)追加并将其生成的巨大文本放在单个字段中外,这就是什么。 You've to add an additional field while indexing to act as a catch-all field. 您必须在建立索引时添加一个附加字段,以作为一个包罗万象的字段。

    but we would like to change it if there is possibility 但是如果有可能,我们希望更改它

    b) https://www.programcreek.com/java-api-examples/index.php?api=org.apache.lucene.queryParser.MultiFieldQueryParser b) https://www.programcreek.com/java-api-examples/index.php?api=org.apache.lucene.queryParser.MultiFieldQueryParser

    c) IndexReader.getFieldNames Lucene 4 c) IndexReader.getFieldNames Lucene 4

    but those solutions are not present in Lucene version 6.2.1 但是这些解决方案在Lucene版本6.2.1中不存在

    IndexReader.getFieldNames() (v. 3.3.0) IndexReader.getFieldNames()(v。3.3.0)

    final AtomicReader reader = searcher.getAtomicReader(); 最终的AtomicReader reader = searcher.getAtomicReader();

    final FieldInfos infos = reader.getFieldInfos(); 最终的FieldInfos infos = reader.getFieldInfos(); (v. 4.2.1) (v。4.2.1)

  2. ...or is there a method (not necessarily MultiFieldQueryParser) which provides search through all fields without their names (v. 6.2.1)? ...或者是否有一种方法(不一定是MultiFieldQueryParser)提供了对所有字段的搜索而没有它们的名称(6.2.1版)?

Based on your question i suggest you just wanna search for some terms and the fields in which this values actually are indexed, aren't really important to know? 根据您的问题,我建议您只想搜索一些术语,并在其中实际索引此值的字段,知道不是很重要吗?

In this case the best approach would implementing a normal fulltext search based on the structure how elasticsearch or solr are able to handle this: 在这种情况下,最好的方法是根据elasticsearch或solr如何处理此问题的结构实施普通的全文本搜索:

  • Add a dedicated "fulltext" TextField to each document (TextField is used for fulltext searches) 向每个文档添加一个专用的“全文” TextField (TextField用于全文搜索)
  • fill fulltext field with all information of the other fields, separated with a space 用其他字段的所有信息填充全文字段,并用空格分隔
  • Search with your term based on your fulltext f 根据全文搜索您的字词

This is how fulltext search can be implemented in a easy way. 这样便可以轻松实现全文搜索。 There is no need to know the field names and iterate over those. 无需知道字段名称并对其进行迭代。

If you have already implemented the solution of putting all the text you wish to search into one catch-all field, why do you want to change it. 如果您已经实施了将希望搜索的所有文本放入一个通用字段的解决方案,那么为什么要更改它。 If you want to change it because it seems like a hack, let me assure you, that is the correct, best solution to this problem. 如果您希望更改它(因为它看起来像是黑客),那么让我向您保证,这是解决此问题的正确,最佳解决方案 That is a pattern recommended in the documentation of both Solr and ElasticSearch . 这是SolrElasticSearch的文档中建议的模式。

Generating a list of fields and creating a big, complicated query against all of them is the hack. 产生一个字段列表并针对所有字段创建大型,复杂的查询就是黑客。 You should definitely stick with the solution you have already implemented. 您绝对应该坚持使用已经实施的解决方案。


If you are one of the poor, unfortunate souls that just can't reindex to add a new field with all the stuff you need to search, and you really need a way to get a list of all the fields and query against them, here you go. 如果您是穷人,不幸的人之一 ,只是无法重新索引以添加需要搜索的所有内容的新字段,那么您确实需要一种方法来获取所有字段的列表并对其进行查询,这里你走。 You can get the list of fields in a LeafReader simply enough, and a DirectoryReader (from DirectoryReader.open, for ex) contains a list of LeafReaderContexts. 您可以很简单地在LeafReader中获得字段列表,而DirectoryReader(例如,来自DirectoryReader.open)包含LeafReaderContexts列表。 So iterate through the LeafReaders, and get and merge the list of fields from each, to get a full list of fields in the index: 因此,遍历LeafReaders,并获取并合并每个字段的字段列表,以获取索引中完整的字段列表:

DirectoryReader reader = DirectoryReader.open(Paths.get('/path/to/my/index'));
HashSet<String> fieldnames = new HashSet<String>();
for (LeafReaderContext subReader : reader.leaves) {
    Fields fields = subReader.reader().fields();
    for (String fieldname : fields) {
        fieldnames.add(fieldname);
    }
}

You could do that on application start, or when you reopen your reader, rather than every time you query. 您可以在应用程序启动时或重新打开阅读器时执行此操作,而不是每次查询时都执行此操作。 Now you have the list of field names that you could pass into MultiFieldQueryParser , or to chuck a bunch of TermQueries into a BooleanQuery or a DisjunctionMaxQuery, or some such. 现在,您有了可以传递给MultiFieldQueryParser或将大量TermQueries放入BooleanQuery或DisjunctionMaxQuery等等中的字段名称列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM