logstash：如何包括输入文件行号

Question

I am trying to create a way to navigate my log files and the main features I need are: 我正在尝试创建一种导航日志文件的方法，我需要的主要功能是：

search for strings inside log file (and returning line of occurrences). 在日志文件中搜索字符串（并返回出现的行）。
pagination from line x to line y . 从x线到y线的分页。

Now I was checking Logstash and it was looking great for my first feature (searching), but not so much for the second one. 现在，我正在检查Logstash，它对于我的第一个功能（搜索）看起来很不错，但对于第二个功能来说并没有那么多。 I was under the idea that I could somehow index the file line number along with the log information of each record, but I can't seem to find a way. 我当时的想法是我可以某种方式索引文件行号以及每条记录的日志信息，但是我似乎找不到办法。

Is there somehow a Logstash Filter to do this? 是否有某种Logstash过滤器可以做到这一点？ or a Filebeat processor? 或Filebeat处理器？ I can't make it work. 我不能使它工作。

I was thinking that maybe I could create a way for all my processes to log into a database with processed information, but that's also kind of impossible (or very difficult) because the Log Handler also doesn't know what's the current log line. 我当时在想，也许我可以为所有进程创建一种方法，以便使用处理过的信息登录数据库，但这也是不可能的（或者非常困难），因为日志处理程序也不知道当前的日志行是什么。

At the end what I could do is, for serving a way to paginate my log file (through a service) would be to actually open it, navigate to a specific line and show it in a service which is not very optimal, as the file could be very big, and I am already indexing it into Elasticsearch (with Logstash). 最后，我所能做的就是，通过一种服务（通过服务）对我的日志文件进行分页的方法是实际打开它，导航到特定的行并在不是很好的服务中显示它，因为该文件可能很大，并且我已经将它编入Elasticsearch（使用Logstash）。

My current configuration is very simple: 我当前的配置非常简单：

Filebeat 文件拍

filebeat.prospectors:
- type: log
  paths:
    - /path/of/logs/*.log
output.logstash:
  hosts: ["localhost:5044"]

Logstash Logstash

input {
    beats {
        port => "5044"
    }
}
output {
  elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}

Right now for example I am getting an item like: 例如，现在我正在获得类似的商品：

    {
      "beat": {
        "hostname": "my.local",
        "name": "my.local",
        "version": "6.2.2"
      },
      "@timestamp": "2018-02-26T04:25:16.832Z",
      "host": "my.local",
      "tags": [
        "beats_input_codec_plain_applied",
      ],
      "prospector": {
        "type": "log"
      },
      "@version": "1",
      "message": "2018-02-25 22:37:55 [mylibrary] INFO: this is an example log line",
      "source": "/path/of/logs/example.log",
      "offset": 1124
    }

If I could somehow include into that item a field like line_number: 1 , would be great as I could use Elasticsearch filters to actually navigate through the whole logs. 如果我能以某种方式在该项目中包含line_number: 1类的字段，那将非常好，因为我可以使用Elasticsearch过滤器实际浏览整个日志。

If you guys have ideas for different ways to store my logs (and navigate) please also let me know 如果你们对存储我的日志（和导航）的不同方式有想法，请也告诉我

Answer 1

Are the log files generated by you? 日志文件是您生成的吗？ Or can you change the log structure? 还是可以更改日志结构？ Then you can add a counter as a prefix and filter it out with logstash. 然后，您可以添加一个计数器作为前缀，并使用logstash过滤掉它。

For example for 例如

12345 2018-02-25 22:37:55 [mylibrary] INFO: this is an example log line

your filter must look like this: 您的过滤器必须如下所示：

filter {
   grok {
     match => {"message" => "%{INT:count} %{GREEDYDATA:message}"
     overwrite => ["message"]
   }
}

New field "count" will be created. 将创建新的字段“ count”。 You can then possibly use it for your purposes. 然后，您可以将其用于您的目的。

Answer 2

At this moment, I don't think there are any solutions here. 目前，我认为这里没有任何解决方案。 Logstash, Beats, Kibana all have the idea of events over time and that's basically the way things are ordered. Logstash，Beats和Kibana都具有随着时间推移而发生事件的想法，这基本上就是事物排序的方式。 Line numbers are more of a text editor kind of functionality. 行号更像是文本编辑器的一种功能。

To a certain degree Kibana can show you the events in a file. Kibana可以在某种程度上向您显示文件中的事件。 It won't give you a page by page kind of list where you can actually click on a page number, but using time frames you could theoretically look at an entire file. 它不会为您提供一页一页的列表，您可以在其中实际单击页码，但是使用时间范围，理论上您可以查看整个文件。

There are similar requests (enhancements) for Beats and Logstash . 对于Beats和Logstash也有类似的请求（增强功能）。

Answer 3

First let me give what is probably the main reason why Filebeat doesn't already have a line number field. 首先，让我给出Filebeat尚无行号字段的可能主要原因。 When Filebeat resumes reading a file (like after a restart) it does an fseek to resume from the last recorded offset. 当Filebeat重新开始读文件（如重启后），它的fseek从记录的最后一个偏移量简历。 If it had to report the line numbers it would either need to store this state in its registry or re-read the file and count newlines up to the offset. 如果它必须报告行号，则要么需要将此状态存储在其注册表中，要么需要重新读取文件并计算换行符直到偏移量。

If you want to offer a service that allows you to paginate through the logs that are backed by Elasticsearch you can use the scroll API with a query for the file. 如果要提供允许您对Elasticsearch支持的日志进行分页的服务，则可以对查询文件使用滚动 API。 You must sort the results by @timestamp and then by offset . 您必须按@timestamp然后按offset对结果进行排序。 Your service would use a scroll query to get the first page of results. 您的服务将使用滚动查询来获取结果的第一页。

POST /filebeat-*/_search?scroll=1m
{
  "size": 10,
  "query": {
    "match": {
      "source": "/var/log/messages"
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    },
    {
      "offset": "asc"
    }
  ]
}

Then to get all future pages you use the scroll_id returned from the first query. 然后，要获取所有将来的页面，请使用从第一个查询返回的scroll_id 。

POST  /_search/scroll
{
    "scroll" : "1m",
    "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBwAAAAAAPXDOFk12OEYw="
}

This will give you all log data for a given file name even tracking it across rotations. 这将为您提供给定文件名的所有日志数据，甚至跨轮循对其进行跟踪。 If line numbers are critical you could produce them synthetically by counting events starting with the first event that has offset == 0 , but I avoid this because it's very error prone especially if you ever add any filtering or multiline grouping. 如果行号很关键，则可以通过对从offset == 0的第一个事件开始的事件进行计数来合成它们，但是我避免了这一点，因为它非常容易出错，尤其是当您添加任何过滤或多行分组时。

logstash：如何包括输入文件行号

问题描述

3 个解决方案

解决方案1
4 已采纳 2018-03-07 10:28:10

解决方案2
1 2018-03-07 06:56:36

解决方案3
-1 2018-03-08 06:17:17

logstash：如何包括输入文件行号

问题描述

3 个解决方案

解决方案1 4 已采纳 2018-03-07 10:28:10

解决方案2 1 2018-03-07 06:56:36

解决方案3 -1 2018-03-08 06:17:17

解决方案1
4 已采纳 2018-03-07 10:28:10

解决方案2
1 2018-03-07 06:56:36

解决方案3
-1 2018-03-08 06:17:17