简体   繁体   中英

One-to-Many Geospatial Search Index Design in Solr

I'm hoping to get some advice on the best way to design a Solr index where each document has multiple tags as well as multiple lat/lng pairs.

The JSON representation of an example document:

Document {
    id: 123,
    name: "Sample Doc",
    tags: [
        {tag:"example1", weight:0.5},
        {tag:"example2", weight:1.0},
        {tag:"example3", weight:1.5}
    ],
    locations: [
        {lat:1.234, lng:5.678},
        {lat:9.876, lng:5.432}
    ]
}

Tags need to be assigned various weights at indexing time (weights do not change between queries). A search against the index consists of a text search against the name and the tags of all the documents within a specific distance from a lat/lng pair. For example, a search for: "Sample example3" within 5000 meters of 9.876/5.432.

In such a search, documents with more tag matches and matches against the title should rank higher (not sure if Solr does by default), while still considering tag weights (which makes it possible that a certain tag may cause the document to rank very high in the search because of its weight).

I've used Solr in the past to perform fulltext search and I've played around with its geospatial features. I'm coming from a Sphinx background but I think Solr is a more robust product for most of my needs. I just need some help to design an index that can do a fulltext + weighted + geospatial efficiently. Any advice is greatly appreciated!

The geospatial multi-valued data is handled easily via location_rpt in Solr's out of the box schema.

The trickier part here is the weighted tags. As a first cut, I'd index 3 fields, tags05 tags10 tags15, each with 3 separate query-time boosts (via edismax's qf param) of 0.5, 1.0, and 1.5 respectively. This is a discretization approach in which you loose some of the weight fidelity depending on how many buckets you have (3 shown here). If you can, avoid Solr 4 JOIN queries; they are often quite slow. The IDF scores would be a little bad due to the data being split up, so you might want to try a different similarity implementation for these fields that don't consider IDF, perhaps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM