简体   繁体   中英

How to iterate through indexed field to add field from another index

I'm rather new to elasticsearch, so i'm coming here in hope to find advices. I have two indices in elastic from two different csv files.

The index_1 has this mapping:

{'settings': {
            'number_of_shards' : 3
    },
    'mappings': {
        'properties': {
            'place': {'type': 'keyword' },
            'address': {'type': 'keyword' },
        }
    }
}

The file is about 400 000 documents long. The index_2 with a much smaller file(about 50 documents) has this mapping:

    {'settings': {
            "number_of_shards" : 1
    },
    'mappings': {
        'properties': {
            'place': {'type': 'text' },
            'address': {'type': 'keyword' },
        }
    }
}

The field "place" in index_2 is all of the unique values from the field "place" in index_1. In both indices the "address" fields are postcodes of datatype keyword with a structure: 0000AZ.

Based on the "place" field keyword in index_1 I want to assign the term of field "address" from index_2.

I have tried using the pandas library but the index_1 file is too large. I have also to tried creating modules based off pandas and elasticsearch, quite unsuccessfully. Although I believe this is a promising direction. A good solution would be to stay into the elasticsearch library as much as possible as these indices will be later be used for further analysis.

If i understand correctly it sounds like you want to use updateByQuery .

the request body should look a little like this:

{
   'query': {'term': {'place': "placeToMatch"}},
   'script': 'ctx._source.address = "updatedZipCode"'
}

This will update the address field of all documents with the matched place.

EDIT:

So what we want to do is use updateByQuery while iterating over all the documents in index2.

First step: get all the documents from index2, will just do this using the basic search feature

{
   "index": 'index2',
   "size": 100 // get all documents, once size is over 10,000 you'll have to padginate.
   "body": {"query": {"match_all": {}}}
}

Now we iterate over all the results and use updateByQuery for each of the results:

// sudo
doc = response[i] 

// update by query request.
{
  index: 'index1',
  body: {
   'query': {'term': {'address': doc._source.address}},
   'script': 'ctx._source.place = "`${doc._source.place}`"'
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM