简体   繁体   中英

How to iterate through indexed field to add field from another index

I'm rather new to elasticsearch, so i'm coming here in hope to find advices. I have two indices in elastic from two different csv files.

The index_1 has this mapping:

{'settings': {
            'number_of_shards' : 3
    'mappings': {
        'properties': {
            'place': {'type': 'keyword' },
            'address': {'type': 'keyword' },

The file is about 400 000 documents long. The index_2 with a much smaller file(about 50 documents) has this mapping:

    {'settings': {
            "number_of_shards" : 1
    'mappings': {
        'properties': {
            'place': {'type': 'text' },
            'address': {'type': 'keyword' },

The field "place" in index_2 is all of the unique values from the field "place" in index_1. In both indices the "address" fields are postcodes of datatype keyword with a structure: 0000AZ.

Based on the "place" field keyword in index_1 I want to assign the term of field "address" from index_2.

I have tried using the pandas library but the index_1 file is too large. I have also to tried creating modules based off pandas and elasticsearch, quite unsuccessfully. Although I believe this is a promising direction. A good solution would be to stay into the elasticsearch library as much as possible as these indices will be later be used for further analysis.

If i understand correctly it sounds like you want to use updateByQuery .

the request body should look a little like this:

   'query': {'term': {'place': "placeToMatch"}},
   'script': 'ctx._source.address = "updatedZipCode"'

This will update the address field of all documents with the matched place.


So what we want to do is use updateByQuery while iterating over all the documents in index2.

First step: get all the documents from index2, will just do this using the basic search feature

   "index": 'index2',
   "size": 100 // get all documents, once size is over 10,000 you'll have to padginate.
   "body": {"query": {"match_all": {}}}

Now we iterate over all the results and use updateByQuery for each of the results:

// sudo
doc = response[i] 

// update by query request.
  index: 'index1',
  body: {
   'query': {'term': {'address': doc._source.address}},
   'script': 'ctx._source.place = "`${doc._source.place}`"'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM