简体   繁体   中英

Elasticsearch: better to have more values or more fields?

Suppose to have an index with documents describing vehicles.

Your index needs to deal with two different type of vehicles: motorcycle and car.

Which of the following mapping is better from a performance point of view? (nested is required for my purposes)

    "vehicle": {
        "type": "nested",
        "properties": {
            "car": {
                "properties": {
                    "model": {
                        "type": "string"
                    },
                    "cost": {
                        "type": "integer"
                    }
                }
            },
            "motorcycle": {
                "properties": {
                    "model": {
                        "type": "string"
                    },
                    "cost": {
                        "type": "integer"
                    }
                }
            }
        }
    }

or this one:

"vehicle": {
    "type": "nested",
    "properties": {

        "model": {
            "type": "string"
        },
        "cost": {
            "type": "integer"
        },
        "vehicle_type": {
            "type": "string"     ### "car", "motorcycle"
        }

    }
}

The second one is more readable and thin.

But the drawback that I'll have is that when I make my queries, if I want to focus only on "car", I need to put this condition as part of the query.

If I use the first mapping, I just need to have a direct access to the stored field, without adding overhead to the query.

The first mapping, where cars and motorcycles are isolated in different fields, is more likely to be faster. The reason is that you have one less filter to apply as you already know, and because of the increased selectivity of the queries (eg less documents for a given value of vehicle.car.model than just vehicle.model )

Another option would be to create two distinct indexes car and motorcycle , possibly with the same index template .

In Elasticsearch, a query is processed by a single-thread per shard. That means, if you split your index in two, and query both in a single request , it will be executed in parallel.

So, when needed to query only one of cars or motorcycles, it's faster simply because indexes are smaller. And when it comes to query both cars and motorcycles it could also be faster by using more threads.

EDIT: one drawback of the later option you should know, the inner lucene dictionary will be duplicated, and if values in cars and motorcycles are quite identical, it doubles the list of indexed terms.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM