简体   繁体   中英

Shipping metrics from Telegraf to BigQuery

I'd like to ship metrics from Telegraf to GCP's BigQuery.

How should I do this? what are the pitfalls?

Having researched this, I think the best way to approach this problem is using Google's DataFlow. Here's the general idea:

  1. Use Telegraf's cloud_pubsub output plugin to ship json-formatted metrics to Google's pubsub service
  2. Use Google's dataflow service to ingest from pubsub and ship to Google's BigQuery
  3. Profit!

This may seem easy, but requires quite a bit of knowledge about these tools. Here are some of things you need to keep in mind:

  1. Start by figuring out what your metrics look like - and create a matching schema for them. BigQuery is schema-based and you MUST define it in advanced or else loading the data into will fail. I found that defining an additional 'file' output for my Telegraf client was beneficial in seeing exactly what json metrics I was shipping into BiqQuery. Note that you can also remove/edit metrics using Telegraf's measurement_filter
  2. Use Telegraf's cloud_pubsub plugin. As I'm writing this answer (Feb 19), its very new and only included in the Master branch. I believe it's intended for Telegraf version 1.10.
  3. Create a pubsub topic, a BigQuery table (with a schema) and google storage bucket - you'll need all three items to create a DataFlow project. You'll also need a credentials file which you generate in GCP and then point your Telegraf client at.
  4. Once done, create a DataFlow project in GCP. These cannot be edited once created and annoyingly, they cannot be deleted (but they can be stopped). So you'll probably end up creating multiple projects until you get it right. I found that almost everything I did required a new project (for example, deleting and then recreating a BigQuery table with the same name required a new project).
  5. If you're seeing errors in dataflow, you should be able to ascertain whether this is a schema issue from the convoluted stack-trace in the logs sub-window in DataFlow. Your metrics MUST EXACTLY match the scheme or else it will not load into the table.
  6. I wasn't able to load metrics in influx format, but I'm told it should be possible. I used the 'json' format, and that worked.
  7. In case you have nested json fields {"person":{"first-name":"john", etc}}, define your nesting field as RECORD type in the BigQuery schema and your nested field within it. its not very obvious unless you already know BigQuery.
  8. I discovered that editing the BigQuery schema in the GCP console was very limited - especially for nested json fields. I had to re-create it a few times from scratch because the UI wont allow editing nested fields after the initial table configuration.

As of version 1.18.0 Telegraf ships with an official BigQuery output plugin .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM