簡體   English   中英

Logstash:elasticsearch 輸出和非結構化數據

[英]Logstash: elasticsearch output and unstructured data

Filebeat.yml 文件:

filebeat.inputs:
- type: log
  paths:
    - C:\Program Files\Filebeat\test_logs\*\*\*\*.txt
  exclude_lines: ['^Infobase.+']
output.logstash:
  hosts: ["localhost:5044"]
  worker: 1

Filebeat 從這樣的文件夾結構中收集日志:

C:\Program Files\Filebeat\test_logs\*\*\*\*.txt

這里有很多文件夾,每個文件夾最后至少有幾條日志。

日志文件示例(在多個日志文件中,時間可能相同,因為日志來自不同的用戶):

"03.08.2020 10:56:38","Event LClick","Type Menu","t=0","beg"
"03.08.2020 10:56:38","Event LClick","Type Menu","Detail SomeDetail","t=109","end"
"03.08.2020 10:56:40","Event LClick","t=1981","beg"
"03.08.2020 10:56:40","Event LClick","t=2090","end"
"03.08.2020 10:56:41","Event LClick","Type ToolBar","t=3026","beg"
"03.08.2020 10:56:43","Event LClick","Type ToolBar","Detail User_Desktop","t=4477","end"
"03.08.2020 10:56:44","Event FormActivate","Name Form_Name:IsaA","t=5444"
"03.08.2020 10:56:51","Event LClick","t=12543","beg"
"03.08.2020 10:56:51","Event LClick","t=12605","end"
"03.08.2020 10:56:52","Event LClick","Form ","Type Label","Name Application.for.training","t=13853","beg"
"03.08.2020 10:57:54","Event LClick","Form Application.for.training","Type Label","Name Application.for.training","t=75442","end"
"03.08.2020 10:57:54","Event FormActivate","Name List.form","t=75785"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85769","beg"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85769","end"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85847","beg"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85847","end"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85879","beg"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85879","end"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85925","beg"
"03.08.2020 10:58:04","Event Wheel","Form List.form","Type FormTable","Name Список","t=85925","end"
"03.08.2020 10:58:08","Event LClick","Form List.form","Type FormTable","Name Список","t=89373","beg"
"03.08.2020 10:58:08","Event LClick","Form List.form","Type FormTable","Name Список","Detail Data","t=89451","end"
"03.08.2020 10:58:15","Event LClick","Form List.form","Type FormTable","Name Список","t=96580","beg"
"03.08.2020 10:58:15","Event LClick","Form List.form","Type FormTable","Name Список","Detail Data","t=96643","end"

Logstash 配置文件:

input {
    beats {
        port => '5044'
    }
}
 filter {
    grok {
        patterns_dir => ['./patterns']
        match => { 'message' => '%{TIME:timestamp}(","Event\s)(?<Event>([^"]+))(","Form\s)?(?<Form>([^"]+))?(","ParentType\s)?(?<parent_type>([^"]+))?(","ParentName\s)?(?<parent_name>([^"]+))?(","Type\s)?(?<type>([^"]+))?(","Name\s)?(?<Name_of_form>([^"]+))?(","Detail\s)?(?<Detail>([^"]+))?(","t=)?(?<t>([\d]+))?(",")?(?<Status>(end|beg))?' }
        add_tag => [ '%{Status}' ]
    }
    dissect {
        mapping => {
            '[log][file][path]' => 'C:\Program Files\Filebeat\test_logs\%{somethingtoo}\%{something}\%{User_Name}\%{filename}.txt'
        }
    }
    date {
        match => [ 'timestamp', 'dd.MM.yyyy HH:mm:ss' ]
    }
    elapsed {
        unique_id_field => 'Event'
        start_tag => 'beg'
        end_tag => 'end'
        new_event_on_match => false
    }

    if 'elapsed' in [tags] {
        aggregate {
            task_id => '%{Event}'
            code => 'map["duration"] = [(event.get("elapsed_time")*1000).to_i]'
            map_action => 'create'
        }
    }
    mutate {
        remove_field => ['timestamp', 'ecs', 'log', 'tags', 'message', '@version', 'something', 'somethingtoo', 'filename', 'input', 'host', 'agent', 't', 'parent_type', 'parent_name', 'type']
        rename => {'elapsed_time' => 'Event_duration'}
    }
}
output {
    elasticsearch {
        hosts => ['localhost:9200']
        index => 'test'
    }
}

在我的 logstash.conf 中,我使用聚合過濾器並設置 worker 1 (-w 1) 以正常工作。

當我只使用一個日志文件進行測試和配置時,我設置了 -w 1 並且一切正常。 但是當我開始從每個目錄收集所有日志時,問題就開始了。 數據沒有正確放入elasticsearch(這個從聚合結果的奇怪數字就可以看出來)

我嘗試在 logstash 輸出 (worker: 1) 的 filebeat.yml 中設置它,但它仍然沒有幫助。

問題:

  1. 也許你知道如何解決這個問題? 因為奇怪的是,對於一個日志文件或一個目錄末尾的多個日志文件,一切都運行良好,而當添加更多目錄時,一切都會突然崩潰。
  2. 如果我正確理解了這個理論,那么 elasticsearch 就有索引和類型。 每個日志都有一個時間和一個用戶名,其日志是,也許我應該將數據按日志時間放在索引中並按用戶名鍵入,以便不同用戶的同一時間的日志不重疊。 我應該如何實施? 我試圖查找信息,只找到關於 document_type 的信息,該信息已被棄用。

您正在使用具有非唯一字段的elapsedaggregate ,您可以在不同文件中為Event字段使用相同的值,這可以使elapsed過濾器使用一個文件中的開始事件和另一個文件中的結束事件。

發生這種情況是因為 filebeat 收集器文件並行並將其批量發送到 logstash。 配置中的worker選項在您的情況下沒有用,它與發送數據的 worker 數量有關,而不是收集。

您可以嘗試使用選項harvester_limit: 1限制harvester_limit: 1來限制並行收割機的數量,但這會減慢您的數據處理速度,並且不能保證它不會混淆您的過濾器。 此外,Filebeat 不保證事件的順序,只保證至少一次交付。

最好的解決方案是創建一個唯一的字段,將Event字段與filename段連接起來,這樣來自不同文件的事件不會混淆。

您可以使用在elapsed過濾器之前添加mutate過濾器來做到這一點。

mutate {
  add_field => { "uniqueEvent" => "%{Event}_%{filename}" }
}

這將創建一個名為uniqueEvent的字段,其值類似於Lclick_filename ,然后您將在elapsedaggregate過濾器中使用此新字段。

如果不同文件夾中的文件名相同,則需要使用路徑中的另一個字段,直到將uniqueEvent的值uniqueEvent唯一值。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM