简体   繁体   中英

Lucene 2.9.2: How to show results in random order?

By default, Lucene returns the query results in the order of relevance (score). You can pass a sort field (or multiple), then the results get sorted by that field.

I am looking now for a nice solution to get the search results in random order .

The bad approach:
Of course I could take ALL results and then shuffle the collection, but in case of 5 Mio search results, that's not performing well.

The elegant paged approach:
With this approach you would be able to tell Lucene the following:
a) Give me results 1 to 10 out of 5Mio results in random order
b) Then give me 11 to 20 (based on the same random sequence used in a).
c) Just to clarify: If you call a) twice you get the same random elements.

How can you implement this approach??


Update Jul27 2012: Be aware that the solution described here for Lucene 2.9.x is not working properly. Using the RandomOrderScoreDocComparator will result in having certain results twice in the resulting list.

You could write a custom FieldComparator :

public class RandomOrderFieldComparator extends FieldComparator<Integer> {

    private final Random random = new Random();

    @Override
    public int compare(int slot1, int slot2) {
        return random.nextInt();
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        return random.nextInt();
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
    }

    @Override
    public void setBottom(int bottom) {
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase) throws IOException {
    }

    @Override
    public Integer value(int slot) {
        return random.nextInt();
    }

}

This doesn't consume any I/O when shuffling the results. Here is my sample program that demonstrates how you use this:

public static void main(String... args) throws Exception {
    RAMDirectory directory = new RAMDirectory();

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);

    IndexWriter writer = new IndexWriter(
            directory,
            new IndexWriterConfig(Version.LUCENE_33, analyzer).setOpenMode(OpenMode.CREATE_OR_APPEND)
        );

    Document alice = new Document();
    alice.add( new Field("name", "Alice", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( alice );

    Document bob = new Document();
    bob.add( new Field("name", "Bob", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( bob );

    Document chris = new Document();
    chris.add( new Field("name", "Chris", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( chris );

    writer.close();


    IndexSearcher searcher = new IndexSearcher( directory );



    for (int pass = 1; pass <= 10; pass++) {
        Query query = new MatchAllDocsQuery();

        Sort sort = new Sort(
                new SortField(
                        "",
                        new FieldComparatorSource() {

                            @Override
                            public FieldComparator<Integer> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                                return new RandomOrderFieldComparator();
                            }

                        }
                    )
            );

        TopFieldDocs topFieldDocs = searcher.search( query, 10, sort );

        System.out.print("Pass #" + pass + ":");
        for (int i = 0; i < topFieldDocs.totalHits; i++) {
            System.out.print( " " + topFieldDocs.scoreDocs[i].doc );
        }
        System.out.println();
    }
}

It yields up this output:

Pass #1: 1 0 2
Pass #2: 1 0 2
Pass #3: 0 1 2
Pass #4: 0 1 2
Pass #5: 0 1 2
Pass #6: 1 0 2
Pass #7: 0 2 1
Pass #8: 1 2 0
Pass #9: 2 0 1
Pass #10: 0 2 1

Bonus! For those of you trapped in Lucene 2

public class RandomOrderScoreDocComparator implements ScoreDocComparator {

    private final Random random = new Random();

    public int compare(ScoreDoc i, ScoreDoc j) {
        return random.nextInt();
    }

    public Comparable<?> sortValue(ScoreDoc i) {
        return Integer.valueOf( random.nextInt() );
    }

    public int sortType() {
        return SortField.CUSTOM;
    }

}

All you have to change is the Sort object:

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new RandomOrderScoreDocComparator();
            }
        }
    )
);

Here's my solution that so far has proven to avoid duplicate results. It's in C# (for Lucene.Net) but I've started it from the Java sample so it should be easily converted back.

The trick I used was a unique ID per search that stays unchanged while the user clicks on pagination. I already had this unique ID that I used for reporting purposes. I initialize it when the user clicks the search button and then I pass it encrypted in the query string. It's finally passed as the seed parameter in RandomOrderFieldComparatorSource (actually it's ID.GetHashCode()).

What this means is I use the same series of random numbers for same search, so each docId gets the same sorting index even when users navigate to other result pages.

One more note, slots could probably be a vector equal to pagesize.

public class RandomOrderFieldComparator : FieldComparator
{
    Random random;
    List<int> slots = new List<int>();
    Dictionary<int, int> docSortIndeces = new Dictionary<int, int>();

    public RandomOrderFieldComparator(int? seed)
    {
        random = seed == null ? new Random() : new Random(seed.Value);
    }

    private int bottom; // Value of bottom of queue

    public override int Compare(int slot1, int slot2)
    {
        // TODO: there are sneaky non-branch ways to compute
        // -1/+1/0 sign
        // Cannot return values[slot1] - values[slot2] because that
        // may overflow
        int v1 = slots[slot1];
        int v2 = slots[slot2];
        if (v1 > v2) {
            return 1;
        } else if (v1 < v2) {
            return -1;
        } else {
            return 0;
        }
    }

    public override int CompareBottom(int doc)
    {
        // TODO: there are sneaky non-branch ways to compute
        // -1/+1/0 sign
        // Cannot return bottom - values[slot2] because that
        // may overflow
        int v2 = GetDocSortingIndex(doc);
        if (bottom > v2) {
            return 1;
        } else if (bottom < v2) {
            return -1;
        } else {
            return 0;
        }
    }

    public override void Copy(int slot, int doc)
    {
        if (slots.Count > slot) {
            slots[slot] = GetDocSortingIndex(doc);
        } else {
            slots.Add(GetDocSortingIndex(doc));
        }
        //slots[slot] = GetDocSortingIndex(doc);
    }

    public override void SetBottom(int bottom)
    {
        this.bottom = slots[bottom];
    }

    public override System.IComparable Value(int slot)
    {
        return (System.Int32)slots[slot];
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
    }

    int GetDocSortingIndex(int docId)
    {
        if (!docSortIndeces.ContainsKey(docId))
            docSortIndeces[docId] = random.Next(int.MinValue, int.MaxValue);

        return docSortIndeces[docId];
    }
}

Here is an updated version for Lucene.Net 4.8.

class RandomFieldComparatorSource : FieldComparerSource
    {
        public override FieldComparer NewComparer(string fieldname, int numHits, int sortPos, bool reversed)
        {
            return new RandomOrderFieldComparator(fieldname, null);
        }
    }

    class RandomOrderFieldComparator : NumericComparer<int>
    {
        static Random random = new Random();

        public RandomOrderFieldComparator(string field, int? missingValue) : base(field, missingValue)
        {
        }

        public override int Compare(int slot1, int slot2)
        {
            return random.Next(-1, 2);
        }
        public override int CompareBottom(int doc)
        {
            return random.Next(-1, 2);
        }
        public override void Copy(int slot, int doc)
        {
        }

        public override void SetBottom(int slot)
        {

        }

        public override void SetTopValue(object value)
        {

        }

        public override int CompareTop(int doc)
        {
            return random.Next(-1,2);
        }

        public override IComparable this[int slot] => random.Next();
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM