简体   繁体   中英

Lucene.Net Query with two MUST Clauses Returning Incorrect Results

I've created a query with two MUST clauses (and zero SHOULD clauses) that is returning results that satisfy only one of the clauses. As far as I can tell, this is incorrect behavior.

An example of such a query before searching is

{+(Text:wba) +(Attribute:10)}

The incorrect results being returned have 'wba' as a term in the 'Text' field, but do not have '10' as a term in the 'Attribute' field.

When I look at my index in Luke, go to the Search tab, and run this search

+Text:wba +Attribute:10

I get no results, as I would expect.

Here's a slightly simplified version of the code to run the search:

public static ScoreDoc[] Search( string searchPhrase, int maxResults, IEnumerable<string> attributes ) {

    var topQuery = new BooleanQuery();

    var textQuery = new BooleanQuery();
    using( var ngAnalyzer = new NGramAnalyzer( Version.LUCENE_30, 3, 9 ) ) {
        using( var stAnalyzer = new StandardAnalyzer( Version.LUCENE_30, new HashSet<string>() ) ) {

            var ngParser = new QueryParser( Version.LUCENE_30, IndexManager.TextFieldName, ngAnalyzer );
            var stParser = new QueryParser( Version.LUCENE_30, IndexManager.TextFieldName, stAnalyzer );

            var terms = AutoCompleter.QueryToTerms( searchPhrase );

            foreach( var word in terms ) {
                if( string.IsNullOrWhiteSpace( word ) ) {
                    continue;
                }

                if( word.Length < 3 ) {
                    textQuery.Add( stParser.Parse( word ), Occur.MUST );
                } else {
                    var parsed = ngParser.Parse( word );

                    var extractedTerms = new HashSet<Term>();
                    parsed.ExtractTerms( extractedTerms );
                    foreach( var term in extractedTerms ) {
                        textQuery.Add( new TermQuery( term ), Occur.SHOULD );
                    }
                }
            }
        }
    }
    topQuery.Add( textQuery, Occur.MUST );

    if( attributes != null && attributes.Any() ) {
        var attrQuery = new BooleanQuery();
        foreach( var attr in attributes ) {
            attrQuery.Add( new TermQuery( new Term( IndexManager.AttributeFieldName, attr ) ), Occur.SHOULD );
        }
        topQuery.Add( attrQuery, Occur.MUST );
    }

    // Actually conduct the search
    var searcher = AutoCompleter.IndexManager.GetOrCreateSearcher( AutoCompleter.TableId );

    var resultDocs = searcher.Search( textQuery, maxResults ).ScoreDocs;

    return resultDocs;
}

Here's an excerpt from the code that produces the index:

// Add the new document
var doc = new Document();
var field = new Field( IndexManager.TextFieldName, term.Text, Field.Store.YES, Field.Index.ANALYZED );
doc.Add( field );
if( !String.IsNullOrWhiteSpace( term.Id ) ) {
    field = new Field( IndexManager.IdFieldName, term.Id, Field.Store.YES, Field.Index.NO );
    doc.Add( field );
}
foreach( var attr in term.Attributes ) {
    if( !String.IsNullOrWhiteSpace( attr ) ) {
        field = new Field( IndexManager.AttributeFieldName, attr, Field.Store.YES, Field.Index.NOT_ANALYZED );
        doc.Add( field );
    }
}
writer.AddDocument( doc );

So, to be clear, I'm expecting only results that match the text clause inside textQuery and at least one of the attribute clauses held in attrQuery . Why isn't this working the way I expect?

This line is wrong:

var resultDocs = searcher.Search( textQuery, maxResults ).ScoreDocs;

Should be:

var resultDocs = searcher.Search( topQuery, maxResults ).ScoreDocs;

Whoops.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM