Logstash / Elasticsearch JDBC document_id与document_type？

Question

So im trying to wrap my head around the document_type vs document_id when using the JDBC importer from logstash and exporting to elasticsearch. 因此，当我使用从logstash的JDBC导入器并导出到elasticsearch时，我试图将我的头放在document_type与document_id之间。

I finally wrapped my head around indexes. 我终于把头缠在索引上了。 But lets pretend im pulling from a table of sensor data (like temp/humidity/etc...) that has sensor id's...temps/ humidity (weather related data) with time recorded. 但是，让我们假装是从传感器数据表（例如温度/湿度/等）中提取即时消息，该表中已记录了传感器ID的...温度/湿度（与天气相关的数据）。 (So it's a big table) （所以这是一张大桌子）

And I want to keep polling the database every X so often. 我想保持每隔X轮询数据库一次。

What would document_type vs document_id be in this instance, this is going to be stored (or whatever you want to call it) against 1 index. 在这种情况下，document_type vs document_id将是什么，它将针对1个索引存储（或您要调用的任何名称）。

The document_type vs document_id confuses me, especially in regards to JDBC importer. document_type与document_id令我感到困惑，尤其是在JDBC导入器方面。

If I set document_id to say my primary key, won't it get over-written each time? 如果我将document_id设置为说我的主键，那么每次都不会被覆盖吗？ So i'll just have 1 document of data each time? 所以我每次只会有1个数据文件？ (which seems pointless) （似乎毫无意义）

Answer 1

The jdbc plugin will create a JSON document with one field for each column. jdbc插件将创建一个JSON文档，每一列具有一个字段。 So to keep consistent with your example, if you had that data it would be imported as a document that looks like this: 因此，为了与您的示例保持一致，如果您拥有该数据，则将其导入为如下所示的文档：

{
    "sensor_id": 567,
    "temp": 90,
    "humidity": 6,
    "timestamp": "{time}",
    "@timestamp": "{time}" // auto-created field, the time Logstash received the document
}

You were right when you said that if you set document_id to your primary key, it would get overwritten. 您说对了，如果将document_id设置为主键，它将被覆盖，这是对的。 You can disregard document_id unless you want to update existing documents in Elasticsearch, which I don't imagine you would want to do with this type of data. 您可以不理会document_id除非您想更新Elasticsearch中的现有文档，我想您也不想使用这种类型的数据。 Let Elasticsearch generate the document id for you. 让Elasticsearch为您生成文档ID。

Now let's talk about document_type . 现在让我们谈谈document_type 。 If you want to set the document type, you need to set the type field in Logstash to some value (which will propagate into Elasticsearch). 如果要设置文档类型，则需要将Logstash中的type字段设置为某个值（该值会传播到Elasticsearch中）。 So the type field in Elasticsearch is used to group similar documents. 因此，Elasticsearch中的type字段用于对相似文档进行分组。 If all of the documents in your table that you're importing with the jdbc plugin are of the same type (they should be!), you can set type in the jdbc input like this... 如果您要使用jdbc插件导入的表中的所有文档都是同一类型（应该是！），则可以在jdbc输入中设置type ，如下所示...

input {
  jdbc {
    jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
    jdbc_user => "mysql"
    parameters => { "favorite_artist" => "Beethoven" }
    schedule => "* * * * *"
    statement => "SELECT * from songs where artist = :favorite_artist"
    ...
    type => "weather"
  }
}

Now, in Elasticsearch you can take advantage of the type field by setting a mapping for that type. 现在，在Elasticsearch中，您可以通过为该type设置映射来利用type字段。 For example you might want: 例如，您可能想要：

PUT my_index 
{
  "mappings": {
    "weather": { 
      "_all":       { "enabled": false  }, 
      "properties": { 
        "sensor_id":      { "type": "integer"  }, 
        "temp":           { "type": "integer"  }, 
        "humidity":       { "type": "integer" },
        "timestamp":      { "type": "date" }  
      }
    }
  }
}

Hope this helps! 希望这可以帮助！ :) :)

Logstash / Elasticsearch JDBC document_id与document_type？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-03-29 02:01:58

Logstash / Elasticsearch JDBC document_id与document_type？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-03-29 02:01:58

解决方案1
1 已采纳 2017-03-29 02:01:58