简体   繁体   中英

How to index a document to a specific ElasticSearch shard?

I want to index a document to a specific ElasticSearch shard.

I know I can configure ES to look at a field, and send it to a specific shard based on that field.

I don't want to do that. I simply want to say: 1) OK, I decide I want to import all documents to Shard 1 this week because I feel like it.

I know there's a way to send a query to a specific shard, but what about an import?

How can I do this?

If you want complete control over shards, you should use multiple indices with single shard each instead of a single index with multiple shards. This way you will be able to decide which index (and shard since you have only one shard per index) you data will go to. You can also create an alias that will combine all such indices into a single alias, so you don't have to worry about listing all indices during searching.

From performance perspective there is very little difference between searching a single index with 10 shards and searching 10 indices with a single shard each. In both cases you will be searching 10 shards. One thing that you should worry about in this scenario is keeping mappings compatible. You, probably, don't want to have a field indexed as a string in one index and as an integer in another.

I am sure you have already solved your problem or found another solution, but I had a similar issue in the project and I want to post what we have done to index a document to a specific shard.

You can achieve this by _routing field of Elasticsearch by calculating a shard number with the given formula by Elasticsearch:

shard_num = hash(_routing) % num_primary_shards

Let's say you would like to allocate a document to shard number 2 and you have to give the routing name when the shard number is 10 when the modulus is taken from its hash and number of the shard. For this you have to find a routing name, to explain in code, I will give an example in Java to find a shard number with a particular routing name:

 for (int i = 0; i < 5; i++) {
    String routing = "tenant"+i;
    final int numberOfShard = 30;
    final int shard = routing.hashCode() % numberOfShard;
    System.out.println("Routing: " + routing + " - shard number: " + shard);
}

Output:

Routing: tenant0 - shard number: -2
Routing: tenant1 - shard number: -1
Routing: tenant2 - shard number: 0
Routing: tenant3 - shard number: -29
Routing: tenant4 - shard number: -28

You have to generate a String that, modulus its hash value and number of shards, leads your desired shard number. From the output above, tenant0 routing name leads to shard number 2 .

As a full example, I would like to demonstrate indexing with a routing name:

Let's say we create " course " index and set routing required:

PUT http://localhost:9200/course
{
    "settings": {
        "number_of_shards": 30
    },
    "mappings": {
        "_routing": {
           "required": true 
        }
    }
}

Then you index a document like this:

PUT http://localhost:9200/course_index/_doc/1?routing=tenant0&refresh=true
{
    "id": 1,
    "title": "Data Security course in Lidl",
    "description": "The course teaches our core Data Security measurements here in Lidle. As new regulations are out, ....",
    "text": "Text of the couse goes here",
    "created_date": 152625632,
    "last_date": 152625632,
    "expiration_date": null,
    "domain_id": 10,
    "language_id": 2
}

In our case, we have a multi-tenant software where about 100 tenants (organizations) share the same index in Elasticsearch, and we had to make sure data security that one tenant can never see data from other tenants. The solution that we came to create an index for all tenants with 100 shards and dedicate one shard for each tenant by finding a routing name for each tenant. As you can see in the index mapping example above, the routing is set to "required" and whenever you send CRUD operations to Elasticsearch, you have to define a routing otherwise Elasticsearch will throw routing_missing_exception

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM