简体   繁体   English

Lucene 6 - 如何用数值影响排名?

[英]Lucene 6 - How to influence ranking with numeric value?

I am new to Lucene, so apologies for any unclear wording. 我是Lucene的新手,对任何措辞不清楚表示道歉。 I am working on an author search engine. 我正在研究作者搜索引擎。 The search query is the author name. 搜索查询是作者姓名。 The default search results are good - they return the names that match the most. 默认搜索结果很好 - 它们返回最匹配的名称。 However, we want to rank the results by author popularity as well, a blend of both the default similarity and a numeric value representing the circulations their titles have. 但是,我们希望按作者受欢迎程度对结果进行排名,默认相似度和表示其标题所具有的流通量的数值。 The problem with the default results is it returns authors nobody is interested in, and while I can rank by circulation alone, the top result is generally not a great match in terms of name. 默认结果的问题是返回作者没有人感兴趣,虽然我可以单独按流通排名,但最高结果通常不是名称上的匹配。 I have been looking for days for a solution for this. 我一直在寻找解决这个问题的日子。

This is how I am building my index: 这就是我构建索引的方式:

    IndexWriter writer = new IndexWriter(FSDirectory.open(Paths.get(INDEX_LOCATION)),
        new IndexWriterConfig(new StandardAnalyzer()));
    writer.deleteAll();
    for (Contributor contributor : contributors) {
        Document doc = new Document();
        doc.add(new TextField("name", contributor.getName(), Field.Store.YES));
        doc.add(new StoredField("contribId", contributor.getContribId()));
        doc.add(new NumericDocValuesField("sum", sum));
        writer.addDocument(doc);
    }
    writer.close();

The name is the field we want to search on, and the sum is the field we want to weight our search results with (but still taking into account the best match for the author name). 名称是我们要搜索的字段,总和是我们要用搜索结果加权的字段(但仍然考虑到作者姓名的最佳匹配)。 I'm not sure if adding the sum to the document is the correct thing to do in this situation. 在这种情况下,我不确定在文档中添加总和是否正确。 I know that there will need to be some experimentation to figure out how to best blend the weighting of the two factors, but my problem is I don't know how to do it in the first place. 我知道需要进行一些实验来弄清楚如何最好地融合这两个因素的权重,但我的问题是我不知道如何做到这一点。

Any examples I've been able to find are either pre-Lucene 4 or don't seem to work. 我能找到的任何例子都是Lucene之前的4或者似乎不起作用。 I thought this was what I was looking for, but it doesn't seem to work. 我以为就是我要找的东西,但它似乎不起作用。 Help appreciated! 帮助赞赏!

As demonstrated in the blog post you linked, you could use a CustomScoreQuery ; 正如您链接的博客文章中所示,您可以使用CustomScoreQuery ; this would give you a lot of flexibility and influence over the scoring process, but it is also a bit overkill. 这会给你很大的灵活性和对评分过程的影响,但它也有点矫枉过正。 Another possibility is to use a FunctionScoreQuery ; 另一种可能性是使用FunctionScoreQuery ; since they behave differently, I will explain both. 因为他们的行为不同,我会解释两者。

Using a FunctionScoreQuery 使用FunctionScoreQuery

A FunctionScoreQuery can modify a score based on a field. FunctionScoreQuery可以根据字段修改分数。

Let's say you create you are usually performing a search like this: 假设您创建通常会执行以下搜索:

Query q = .... // pass the user input to the QueryParser or similar
TopDocs hits = searcher.search(query, 10); // Get 10 results

Then you can modify the query in between like this: 然后你可以在这两者之间修改查询:

Query q = .....

// Note that a Float field would work better.
DoubleValuesSource boostByField = DoubleValuesSource.fromLongField("sum");

// Create a query, based on the old query and the boost
FunctionScoreQuery modifiedQuery = new FunctionScoreQuery(q, boostByField);

// Search as usual
TopDocs hits = searcher.search(query, 10);

This will modify the query based on the value of field. 这将根据字段的值修改查询。 Sadly, however, there isn't a possibility to control the influence of the DoubleValuesSource (besides by scaling the values during indexing) - at least none that I know of. 然而,遗憾的是,没有可能控制DoubleValuesSource的影响(除了通过在索引期间缩放值) - 至少没有我所知道的。

To have more control, consider using the CustomScoreQuery . 要获得更多控制权,请考虑使用CustomScoreQuery

Using a CustomScoreQuery 使用CustomScoreQuery

Using this kind of query will allow you to modify a score of each result any way you like. 使用这种查询将允许您以任何方式修改每个结果的分数。 In this context we will use it to alter the score based on a field in the index. 在此上下文中,我们将使用它来根据索引中的字段更改分数。 First, you will have to store your value during indexing: 首先,您必须在索引期间存储您的值:

doc.add(new StoredField("sum", sum)); 

Then we will have to create our very own query class: 然后我们将不得不创建我们自己的查询类:

private static class MyScoreQuery extends CustomScoreQuery {
    public MyScoreQuery(Query subQuery) {
        super(subQuery);
    }

    // The CustomScoreProvider is what actually alters the score
    private class MyScoreProvider extends CustomScoreProvider {

        private LeafReader reader;
        private Set<String> fieldsToLoad;

        public MyScoreProvider(LeafReaderContext context) {
            super(context);
            reader = context.reader();

            // We create a HashSet which contains the name of the field
            // which we need. This allows us to retrieve the document 
            // with only this field loaded, which is a lot faster.
            fieldsToLoad = new HashSet<>();
            fieldsToLoad.add("sum");
        }

        @Override
        public float customScore(int doc_id, float currentScore, float valSrcScore) throws IOException {
            // Get the result document from the index
            Document doc = reader.document(doc_id, fieldsToLoad);

            // Get boost value from index               
            IndexableField field = doc.getField("sum");
            Number number = field.numericValue();

            // This is just an example on how to alter the current score
            // based on the value of "sum". You will have to experiment
            // here.
            float influence = 0.01f;
            float boost = number.floatValue() * influence;

            // Return the new score for this result, based on the 
            // original lucene score.
            return currentScore + boost;
        }           
    }

    // Make sure that our CustomScoreProvider is being used.
    @Override
    public CustomScoreProvider getCustomScoreProvider(LeafReaderContext context) {
        return new MyScoreProvider(context);
    }       
}

Now you can use your new Query class to modify an existing query, similar to the FunctionScoreQuery : 现在,您可以使用新的Query类来修改现有查询,类似于FunctionScoreQuery

Query q = .....

// Create a query, based on the old query and the boost
MyScoreQuery modifiedQuery = new MyScoreQuery(q);

// Search as usual
TopDocs hits = searcher.search(query, 10);

Final remarks 最后的评论

Using a CustomScoreQuery , you can influence the scoring process in all kinds of ways. 使用CustomScoreQuery ,您可以通过各种方式影响评分过程。 Remember however that the method customScore is called for each search result - so don't perform any expensive computations there, as this would severely slow down the search process. 但请记住,为每个搜索结果调用customScore方法 - 因此不要在那里执行任何昂贵的计算,因为这会严重降低搜索过程的速度。

I've creating a small gist of a full working example of the CustomScoreQuery here: https://gist.github.com/philippludwig/14e0d9b527a6522511ae79823adef73a 我在这里创建了一个关于CustomScoreQuery完整工作示例的小小的要点: httpsCustomScoreQuery

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM