简体   繁体   English

elasticsearch使用logstash和csv删除文档

[英]elasticsearch delete documents using logstash and csv

Is there any way to delete documents from ElasticSearch using Logstash and a csv file? 有什么方法可以使用Logstash和csv文件从ElasticSearch中删除文档? I read the Logstash documentation and found nothing and tried a few configs but nothing happened using action "delete" 我阅读了Logstash文档,却一无所获,并尝试了一些配置,但是使用操作“删除”却没有任何反应

output {
    elasticsearch{
        action => "delete"
        host => "localhost"
        index => "index_name"
        document_id => "%{id}"
    }
} 

Has anyone tried this? 有人尝试过吗? Is there anything special that I should add to the input and filter sections of the config? 我应该在配置的输入和过滤器部分添加一些特殊的东西吗? I used file plugin for input and csv plugin for filter. 我使用文件插件作为输入,使用csv插件作为过滤器。

It is definitely possible to do what you suggest, but if you're using Logstash 1.5, you need to use the transport protocol as there is a bug in Logstash 1.5 when doing delete s over the HTTP protocol (see issue #195 ) 绝对可以按照您的建议去做,但是如果您使用的是Logstash 1.5,则需要使用transport协议,因为通过HTTP协议执行delete时Logstash 1.5中存在一个错误(请参见问题#195 )。

So if your delete.csv CSV file is formatted like this: 因此,如果您的delete.csv CSV文件格式如下:

id
12345
12346
12347

And your delete.conf Logstash config looks like this: 您的delete.conf Logstash配置如下所示:

input {
    file {
        path => "/path/to/your/delete.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
filter {
    csv {
        columns => ["id"]
    }
}
output {
    elasticsearch{
        action => "delete"
        host => "localhost"
        port => 9300                         <--- make sure you have this
        protocol => "transport"              <--- make sure you have this
        index => "your_index"                <--- replace this
        document_type => "your_doc_type"     <--- replace this
        document_id => "%{id}"
    }
}

Then when running bin/logstash -f delete.conf you'll be able to delete all the documents whose id is specified in your CSV file. 然后,在运行bin/logstash -f delete.conf您将能够删除ID在CSV文件中指定的所有文档。

In addition to Val's answer, I would add that if you have a single input that has a mix of deleted and upserted rows, you can do both if you have a flag that identifies the ones to delete. 除了Val的答案外,我还要补充一点:如果您有一个包含删除行和升序行混合输入的单个输入,那么如果您有一个标识要删除的行的标志,则可以同时执行这两个操作。 The output > elasticsearch > action parameter can be a "field reference," meaning that you can reference a per-row field. output > elasticsearch > action参数可以是“字段引用”,这意味着您可以引用每行字段。 Even better, you can change that field to a metadata field so that it can be used in a field reference without being indexed. 更好的是,您可以将该字段更改为元数据字段,这样它就可以在字段引用中使用而无需编制索引。

For example, in your filter section: 例如,在您的filter部分中:

filter {
    # [deleted] is the name of your field
    if [deleted] {
        mutate {    
            add_field => {
                "[@metadata][elasticsearch_action]" => "delete"
            }
        }
        mutate {
            remove_field => [ "deleted" ]
        }
    } else {
        mutate {    
            add_field => {
                "[@metadata][elasticsearch_action]" => "index"
            }
        }
        mutate {
            remove_field => [ "deleted" ]
        }
    }   
}

Then, in your output section, reference the metadata field: 然后,在输出部分中,引用元数据字段:

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "myindex"
        action => "%{[@metadata][elasticsearch_action]}"
        document_type => "mytype"
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM