简体   繁体   中英

elasticsearch delete documents using logstash and csv

Is there any way to delete documents from ElasticSearch using Logstash and a csv file? I read the Logstash documentation and found nothing and tried a few configs but nothing happened using action "delete"

output {
    elasticsearch{
        action => "delete"
        host => "localhost"
        index => "index_name"
        document_id => "%{id}"
    }
} 

Has anyone tried this? Is there anything special that I should add to the input and filter sections of the config? I used file plugin for input and csv plugin for filter.

It is definitely possible to do what you suggest, but if you're using Logstash 1.5, you need to use the transport protocol as there is a bug in Logstash 1.5 when doing delete s over the HTTP protocol (see issue #195 )

So if your delete.csv CSV file is formatted like this:

id
12345
12346
12347

And your delete.conf Logstash config looks like this:

input {
    file {
        path => "/path/to/your/delete.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
filter {
    csv {
        columns => ["id"]
    }
}
output {
    elasticsearch{
        action => "delete"
        host => "localhost"
        port => 9300                         <--- make sure you have this
        protocol => "transport"              <--- make sure you have this
        index => "your_index"                <--- replace this
        document_type => "your_doc_type"     <--- replace this
        document_id => "%{id}"
    }
}

Then when running bin/logstash -f delete.conf you'll be able to delete all the documents whose id is specified in your CSV file.

In addition to Val's answer, I would add that if you have a single input that has a mix of deleted and upserted rows, you can do both if you have a flag that identifies the ones to delete. The output > elasticsearch > action parameter can be a "field reference," meaning that you can reference a per-row field. Even better, you can change that field to a metadata field so that it can be used in a field reference without being indexed.

For example, in your filter section:

filter {
    # [deleted] is the name of your field
    if [deleted] {
        mutate {    
            add_field => {
                "[@metadata][elasticsearch_action]" => "delete"
            }
        }
        mutate {
            remove_field => [ "deleted" ]
        }
    } else {
        mutate {    
            add_field => {
                "[@metadata][elasticsearch_action]" => "index"
            }
        }
        mutate {
            remove_field => [ "deleted" ]
        }
    }   
}

Then, in your output section, reference the metadata field:

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "myindex"
        action => "%{[@metadata][elasticsearch_action]}"
        document_type => "mytype"
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM