简体   繁体   中英

Bulk delete elasticsearch

i am using elastic search 2.2.

here is the count of documents

curl 'xxxxxxxxx:9200/_cat/indices?v'

yellow open   app                 5   1   28019178         5073     11.4gb         11.4gb

In the "app" index we have two types of document.

  1. "log"
  2. "syslog"

Now i want to delete all the documents under type "syslog".

Hence, i tried using the following command

 curl -XDELETE "http://xxxxxx:9200/app/syslog"

But am getting the following error

No handler found for uri [/app/syslog]

i have installed delete-by-query plugin as well. Is there any way i can do a bulk delete operation ?

For now , i am deleting records by fetching the id.

curl -XDELETE "http://xxxxxx:9200/app/syslog/A121312"

it took around 5 mins for me to delete 10000 records. i have more than 1000000 docs which needs to be deleted. please help.

[EDIT -1]

i ran the below query to delete syslog type docs

curl -XDELETE 'http://xxxxxx:9200/app/syslog/_query' -d'
{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  }
}'

And result is below

{"found":false,"_index":"app","_type":"syslog","_id":"_query","_version":1,"_shards":{"total":2,"successful":1,"failed":0}}

i used to query to get this message from index

 {
      "_index" : "app",
      "_type" : "syslog",
      "_id" : "AVckPMQnKYIebrQhF556",
      "_score" : 1.0,
      "_source" : {
        "message" : "some test message",
        "@version" : "1",
        "@timestamp" : "2016-09-13T15:49:04.562Z",
        "type" : "syslog",
        "host" : "1.2.3.4",
        "priority" : 0,
        "severity" : 0,
        "facility" : 0,
        "facility_label" : "kernel",
        "severity_label" : "Emergency"
}

[EDIT 2]

Delete by query listed as plugin

sudo /usr/share/elasticsearch/bin/plugin list
Installed plugins in /usr/share/elasticsearch/plugins/node1:
    - delete-by-query

I had similar problem, after filling elasticsearch with 77 millions of unwanted documents in last couple of days. Setting timeout in query is your friend. As mentioned here . Curl has parameter to increase too (-m 3600)

curl --request DELETE \
  --url 'http://127.0.0.1:9200/nadhled/tree/_query?timeout=60m' \
  --header 'content-type: application/json' \
  -m 3600 \
  --data '{"query":{
            "filtered":{
              "filter":{
                "range":{
                  "timestamp":{
                    "lt":1564826247
                   },
                  "timestamp":{
                    "gt":1564527660
                  }
                }
              }
            }
          }
        }'

I know this is not your bulk delete, but I've found this page during my research so I post it here. Hope it helps you too.

I would suggest that you should rather create a new index and reindex the documents you want to keep

But if you wanna use delete by query you should use this,

curl -XDELETE 'http://xxxxxx:9200/app/syslog/_query'

{
  "query": {
    "bool": {
      "must": [
        {
          "match_all": {}
        }
      ]
    }
  }
}

but then you'll be left with mapping.

In latest Elasticsearch(5.2), you could use _delete_by_query

curl -XPOST "http://localhost:9200/index/type/_delete_by_query" -d'
{
    "query":{
        "match_all":{}
    }
}'

The delete-by-query API is new and should still be considered experimental. The API may change in ways that are not backwards compatible

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM