如何通过curl查询Logstash并仅返回特定字段

Question

Right now I am using the "match_all" query to get the data that Logstash is handling. 现在，我正在使用“ match_all”查询来获取Logstash正在处理的数据。 The output that I get is every single field that is part of the event, as it should be. 我得到的输出是应该作为事件一部分的每个字段。 Here is my query: 这是我的查询：

{
"query": {
    "match_all" : { }
},
  "size": 1,
  "sort": [
{
 "@timestamp": {
     "order": "desc"
  }
  }
  ]
}

As you can see, I am also sorting my results that I always get the most recent one that has been outputted. 如您所见，我还在对我的结果进行排序，以便始终获得最新输出的结果。

Here is an example of my output: 这是我的输出示例：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 15768,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "filebeat-2017.02.24",
        "_type" : "bro",
        "_id" : "AVpx-pFtiEtl3Zqhg8tF",
        "_score" : null,
        "_source" : {
          "resp_pkts" : 0,
          "source" : "/usr/local/bro/logs/current/conn.log",
          "type" : "bro",
          "id_orig_p" : 56058,
          "duration" : 848.388112,
          "local_resp" : true,
          "uid" : "CPndOf4NNf9CzTILFi",
          "id_orig_h" : "192.168.137.130",
          "conn_state" : "OTH",
          "@version" : "1",
          "beat" : {
            "hostname" : "localhost.localdomain",
            "name" : "localhost.localdomain",
            "version" : "5.2.0"
          },
          "host" : "localhost.localdomain",
          "id_resp_h" : "192.168.137.141",
          "id_resp_p" : 22,
          "resp_ip_bytes" : 0,
          "offset" : 115612,
          "orig_bytes" : 32052,
          "local_orig" : true,
          "input_type" : "log",
          "orig_ip_bytes" : 102980,
          "orig_pkts" : 1364,
          "missed_bytes" : 0,
          "history" : "DcA",
          "tunnel_parents" : [ ],
          "message" : "{\"ts\":1487969779.653504,\"uid\":\"CPndOf4NNf9CzTILFi\",\"id_orig_h\":\"192.168.137.130\",\"id_orig_p\":56058,\"id_resp_h\":\"192.168.137.141\",\"id_resp_p\":22,\"proto\":\"tcp\",\"duration\":848.388112,\"orig_bytes\":32052,\"resp_bytes\":0,\"conn_state\":\"OTH\",\"local_orig\":true,\"local_resp\":true,\"missed_bytes\":0,\"history\":\"DcA\",\"orig_pkts\":1364,\"orig_ip_bytes\":102980,\"resp_pkts\":0,\"resp_ip_bytes\":0,\"tunnel_parents\":[]}",
          "tags" : [
            "beats_input_codec_plain_applied"
          ],
          "@timestamp" : "2017-02-24T21:15:29.414Z",
          "resp_bytes" : 0,
          "proto" : "tcp",
          "fields" : {
            "sensorType" : "networksensor"
          },
          "ts" : 1.487969779653504E9
        },
        "sort" : [
          1487970929414
        ]
      }
    ]
  }
}

As you can see, that is a lot of output to handle in an outside application (written in C#, so the garbage collection is massive on all these strings), that I just don't need. 如您所见，在外部应用程序（用C＃编写，因此所有这些字符串上的垃圾回收量很大）中要处理的输出很多，而我只是不需要。

My question is, how can I set up my query, so that I only grab the fields that I need? 我的问题是，如何设置查询，以便仅获取所需的字段？

Answer 1

For 5.x there was a change that allows you to do _source filtering. 对于5.x，有一项更改使您可以进行_source过滤。 The documentation for that is here , it would look like this: 该文档在这里，看起来像这样：

{ 
 "query": {
   "match_all" : { }
 },
 "size": 1,
 "_source": ["a","b"],
 ...

And the result looks like: 结果看起来像：

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "xxx",
        "_type" : "xxx",
        "_id" : "xxx",
        "_score" : 1.0,
        "_source" : {
          "a" : 1,
          "b" : "2"
        }
      }
    ]
  }
}

For versions prior to 5, you can do it with a fields parameter: 对于5之前的版本，可以使用fields参数来实现：

You query can pass a ,"fields": ["field1","field2"...] at the root level of your query. 您的查询可以在查询的根级别传递,"fields": ["field1","field2"...] 。 The format that it comes back in will be different, but it will work. 它返回的格式会有所不同，但是会起作用。

{ 
"query": {
  "match_all" : { }
},
"size": 1,
"fields": ["a","b"],
...

That will produce output like this: 这将产生如下输出：

  {
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2077,
    "max_score": 1,
    "hits": [
      {
        "_index": "xxx",
        "_type": "xxx",
        "_id": "xxxx",
        "_score": 1,
        "fields": {
          "a": [
            0
          ],
          "b": [
            "xyz"
          ]
        }
      }
    ]
  }
}

the fields are always arrays (since the 1.0 API) and there isn't any way to change that because Elasticsearch is inherently mutli-value aware. 这些字段始终是数组（自1.0 API起），并且没有任何方法可以更改，因为Elasticsearch本质上是多值感知的。

如何通过curl查询Logstash并仅返回特定字段

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-02-24 21:31:18

如何通过curl查询Logstash并仅返回特定字段

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-02-24 21:31:18

解决方案1
2 已采纳 2017-02-24 21:31:18