简体   繁体   中英

Bulk Update for elasticsearch documents using Python

I have elasticsearch documents like below where I need to rectify age value based on creationtime currentdate

age = creationtime - currentdate

:

hits = [
   {
      "_id":"CrRvuvcC_uqfwo-WSwLi",
      "creationtime":"2018-05-20T20:57:02",
      "currentdate":"2021-02-05 00:00:00",
      "age":"60 months"
   },
   {
      "_id":"CrRvuvcC_uqfwo-WSwLi",
      "creationtime":"2013-07-20T20:57:02",
      "currentdate":"2021-02-05 00:00:00",
      "age":"60 months"
   },
   {
      "_id":"CrRvuvcC_uqfwo-WSwLi",
      "creationtime":"2014-08-20T20:57:02",
      "currentdate":"2021-02-05 00:00:00",
      "age":"60 months"
   },
   {
      "_id":"CrRvuvcC_uqfwo-WSwLi",
      "creationtime":"2015-09-20T20:57:02",
      "currentdate":"2021-02-05 00:00:00",
      "age":"60 months"
   }
]

I want to do bulk update based on each document ID, but the problem is I need to correct 6 months of data & per data size (doc count of Index) is almost 535329 , I want to efficiently do bulk update on age based on _id for each day on all documents using python.

Is there a way to do this, without looping through, all examples I came across using Pandas dataframes for update is based on a known value. But here _id I will get as and when the code runs.

The logic I had written was to fetch all doc & store their _id & then for each _id update the age . But its not an efficient way if I want to update all documents in bulk for each day of 6 months.

Can anyone give me some ideas for this or point me in the right direction.

As mentioned in the comments, fetching the IDs won't be necessary. You don't even need to fetch the documents themselves!

A single _update_by_query call will be enough. You can use ChronoUnit to get the difference after you've parsed the dates:

POST your-index-name/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """
      def created =  LocalDateTime.parse(ctx._source.creationtime, DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss"));

      def currentdate = LocalDateTime.parse(ctx._source.currentdate, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));
    
      def months = ChronoUnit.MONTHS.between(created, currentdate);
      ctx._source._age = months + ' month' + (months > 1 ? 's' : '');
    """,
    "lang": "painless"
  }
}

The official python client has this method too . Here's a working example .

Try running this update script on a small subset of your documents before letting in out on your whole index by adding a query other than the match_all I put there.


It's worth mentioning that unless you search on this age field, it doesn't need to be stored in your index because it can be calculated at query time.

You see, if your index mapping's dates are properly defined like so:

{
  "mappings": {
    "properties": {
      "creationtime": {
        "type": "date",
        "format": "yyyy-MM-dd'T'HH:mm:ss"
      },
      "currentdate": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      ...
    }
  }
}

the age can be calculated as a script field :

POST ttimes/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "age_calculated": {
      "script": {
        "source": """
          def months = ChronoUnit.MONTHS.between(
                          doc['creationtime'].value,
                          doc['currentdate'].value );
          return months + ' month' + (months > 1 ? 's' : '');
        """
      }
    }
  }
}

The only caveat is, the value won't be inside of the _source but rather inside of its own group called fields (which implies that more script fields are possible at once.).

"hits" : [
  {
    ...
    "_id" : "FFfPuncBly0XYOUcdIs5",
    "fields" : {
      "age_calculated" : [ "32 months" ]   <--
    }
  },
  ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM