简体   繁体   English

Logstash / Elasticsearch JDBC document_id与document_type?

[英]Logstash/Elasticsearch JDBC document_id vs document_type?

So im trying to wrap my head around the document_type vs document_id when using the JDBC importer from logstash and exporting to elasticsearch. 因此,当我使用从logstash的JDBC导入器并导出到elasticsearch时,我试图将我的头放在document_type与document_id之间。

I finally wrapped my head around indexes. 我终于把头缠在索引上了。 But lets pretend im pulling from a table of sensor data (like temp/humidity/etc...) that has sensor id's...temps/ humidity (weather related data) with time recorded. 但是,让我们假装是从传感器数据表(例如温度/湿度/等)中提取即时消息,该表中已记录了传感器ID的...温度/湿度(与天气相关的数据)。 (So it's a big table) (所以这是一张大桌子)

And I want to keep polling the database every X so often. 我想保持每隔X轮询数据库一次。

What would document_type vs document_id be in this instance, this is going to be stored (or whatever you want to call it) against 1 index. 在这种情况下,document_type vs document_id将是什么,它将针对1个索引存储(或您要调用的任何名称)。

The document_type vs document_id confuses me, especially in regards to JDBC importer. document_type与document_id令我感到困惑,尤其是在JDBC导入器方面。

If I set document_id to say my primary key, won't it get over-written each time? 如果我将document_id设置为说我的主键,那么每次都不会被覆盖吗? So i'll just have 1 document of data each time? 所以我每次只会有1个数据文件? (which seems pointless) (似乎毫无意义)

The jdbc plugin will create a JSON document with one field for each column. jdbc插件将创建一个JSON文档,每一列具有一个字段。 So to keep consistent with your example, if you had that data it would be imported as a document that looks like this: 因此,为了与您的示例保持一致,如果您拥有该数据,则将其导入为如下所示的文档:

{
    "sensor_id": 567,
    "temp": 90,
    "humidity": 6,
    "timestamp": "{time}",
    "@timestamp": "{time}" // auto-created field, the time Logstash received the document
}

You were right when you said that if you set document_id to your primary key, it would get overwritten. 您说对了,如果将document_id设置为主键,它将被覆盖,这是对的。 You can disregard document_id unless you want to update existing documents in Elasticsearch, which I don't imagine you would want to do with this type of data. 您可以不理会document_id除非您想更新Elasticsearch中的现有文档,我想您也不想使用这种类型的数据。 Let Elasticsearch generate the document id for you. 让Elasticsearch为您生成文档ID。

Now let's talk about document_type . 现在让我们谈谈document_type If you want to set the document type, you need to set the type field in Logstash to some value (which will propagate into Elasticsearch). 如果要设置文档类型,则需要将Logstash中的type字段设置为某个值(该值会传播到Elasticsearch中)。 So the type field in Elasticsearch is used to group similar documents. 因此,Elasticsearch中的type字段用于对相似文档进行分组。 If all of the documents in your table that you're importing with the jdbc plugin are of the same type (they should be!), you can set type in the jdbc input like this... 如果您要使用jdbc插件导入的表中的所有文档都是同一类型(应该是!),则可以在jdbc输入中设置type ,如下所示...

input {
  jdbc {
    jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
    jdbc_user => "mysql"
    parameters => { "favorite_artist" => "Beethoven" }
    schedule => "* * * * *"
    statement => "SELECT * from songs where artist = :favorite_artist"
    ...
    type => "weather"
  }
}

Now, in Elasticsearch you can take advantage of the type field by setting a mapping for that type. 现在,在Elasticsearch中,您可以通过为该type设置映射来利用type字段。 For example you might want: 例如,您可能想要:

PUT my_index 
{
  "mappings": {
    "weather": { 
      "_all":       { "enabled": false  }, 
      "properties": { 
        "sensor_id":      { "type": "integer"  }, 
        "temp":           { "type": "integer"  }, 
        "humidity":       { "type": "integer" },
        "timestamp":      { "type": "date" }  
      }
    }
  }
}

Hope this helps! 希望这可以帮助! :) :)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 document_id 将 logstash 输出到 elasticsearch; 当我没有 document_id 时该怎么办? - logstash output to elasticsearch with document_id; what to do when I don't have a document_id? Logstash Elasticsearch Output 中 document_id 的正确语法 - Correct syntax for document_id in Logstash Elasticsearch Output Logstash:使用日志文件的行号作为document_id - Logstash: use line number of the log file as document_id Logstash使两个数据库保持同步-无法访问%{document_id} - Logstash to Keep Two Databases Synced - Cannot Access %{document_id} 如何使用 logstash 配置文件在弹性中设置 document_id - How to set document_id in elastic using logstash config file 弹性搜索、Logstash:document_id 字符串未得到评估 - Elastic search, Logstash: document_id string does not get evaluated Logstash 弹性搜索 output 的 document_id 不正确 - Incorrect document_id for Logstash elastic search output 将 csv 从 Python 转移到 elasticsearch,document_id 为 Z628CB5675FF524F3E719B7AA2E8FE8 - Transfer csv to elasticsearch from Python with document_id as csv field 在Logstash管道输出部分中,document_id不是获取RDBMS主键值,而是整个语法作为字符串。 - In Logstash pipeline output section, document_id is not fetching the RDBMS primary key value, but the whole syntax as a string 如何使用logstash和elasticsearch设置文档_id和_source - How to set document _id and _source with logstash and elasticsearch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM